m JOURNAL” 
XPERIMENTAL EDUCATION 


Marcu, 1934 Number 3 


EDUCATIONAL MEASUREMENTS 


CONTENTS 
The Derivation of Norms: S. A. Courtis 


Some Assumptions Involved in Personality Measurement: 
GORDON HENDRICKSON 


The Economical Collection of Data for Test Validation: Pavt Horst 
Item Analysis by the Method of Successive Residuals: Paut Horst 


Improved Overlapping Methods for Determining Validities of Test Items: 
Joun A. Lone 


The Negative Suggestion Effect of the False Statement in the True-False 
Test: Howarp Y. McCiusky 


A Graphical Method for Computing the Standard Error of Biserial r: 
WALTER J. McNamara and Jack W. DuNLap 


A New Technique for Machine Computation of Coefficients of Correlation: 
Marc J. FELDSTEIN 


The Construction and Interpretation of Differential Ability Patterns: 
Davin SEGEL 


The Insignificance of Significant Differences: Enwarp A. LINCOLN 288 
Predicting Stabilized Salary-Schedule Costs: DovcGias E. Scates 291 
“Perseveration” in a Group of Subnormal Children: K. H. RoGers 301 


An Analysis of the Scores of Eighth-Grade Pupils and Normal-School Stu- 
dents on Certain Objective Tests: C. C. UpsHatt and Harry V. Masters 310 


PUBLISHED QUARTERLY $1.50 a Copy 


EDWARDS BROTHERS, INC. 


LITHOPRINTERS AND PUBLISHERS 
ANN ARBOR, MICHIGAN 


Application filed for entry as second class matter. Printed in U.S. A. 





EDITORIAL BOARD 
JOURNAL OF EXPERIMENTAL EDUCATION 


A. S. Barr, Chairman, Professor of Education, University of Wisconsin, Madison, Wisco». jn. 


Carter V. Good, Professor of Education, Uni- Walter S. Monroe, Professor of Education, Up; 
versity of Cincinnati, Cincinnati, Ohio. Edi- versity of Illinois, Urbana, Illinois. Editor: 
torially responsible for materials on super- ally responsible for materials on menses. 
vision and psychology of learning and teach- ments, statistics, and methods of experimen, 
ing. tal research. ad 

Henry Harap, Associate Professor of Educa- George D. Stoddard, Director, Child Welfay 

tion, Western Reserve University, Cleveland, Research Station, State University of “my 

Ohio. Editorially responsible for materials on Iowa City, Iowa. Editorially responsi})\c for 

experimental studies of curriculum construc- materials on child welfare, guidance, and de. 

tion. velopment. 


CONTRIBUTING EDITORS 


Harry J. Baker, Director, Psychological Clinic, Detroit Lois Hayden Meek, Professor of Education, Directoy 
Public Schools, Detroit, Michigan. Child Development Institute, Teachers ¢ ; 
W. E. Blatz, Nursery School Division, University of Columbia University, New York City. 
Toronto, Toronto 5, Canada. C. W. Odell, Associate Professor of Education, Univer 
William F. Book, Head, Department of Psychology and sity of Illinois, Urbana, Illinois. 
Philosophy, Indiana University, Bloomington, Indiana Willard C. Olson, Director of Research in Child Devel. 
Fowler D. Brooks, Head, Departments of Education opment, University of Michigan, Ann Arbor, Mich, 
and Psychology, DePauw University, Greencastle, W. E. Peik, Associate Professor of Education, Univer 
Indiana. sity of Minnesota, Minneapolis, Minnesota. 
William A. Brownell, Professor of Educational Psy- S. L. Pressey, Professor of Educational Psychology 
chology, Duke University, Durham, North Carolina. Ohio State University, Columbus, Ohio. r a 
Leo J. Brueckner, Professor of Education, University of W. H. Pyle, Professor of Educational Psychology, De- 
Minnesota, Minneapolis, Minnesota. troit Teachers College, Detroit, Michigan. “i 
Herbert B. Bruner, Professor of Education, Teachers Paul T. Rankin, Supervising Direct a 
College, Columbia University, New York City. tion, Detroit, Michigan. eee eo oe ite 
Barbara S. Burks, Psychologist, Institute of Child Wel- 
fare, University of California, Berkeley, California. 
Otis W. Caldwell, Professor of Education, Teachers H. H. Remmers, Director, Division of Educational! Ref. 


College, Columbia University, New York City. erence, Professor of Education and Psychology, Pur- 
Ellsworth Collings, Dean, College of Education, Univer- due University, Lafayette, Indiana. 


sity of Oklahoma, Norman, Oklahoma. ; G. M. Ruch, Professor of Education, University of Cal 
Philip W. L. Cox, Professor of Secondary Education, ifornia, Berkeley, California. , 
New York University, New York City. 


Edgar A. Doll, Director of Research, Training School, 
Vineland, New Jersey. 

Harl R. Douglass, Professor of Education, University 
of Minnesota, Minneapolis, Minnesota. 


W. Dunlap, A i Prof f Education, ‘ 
ge Seidae tee Vouk City. . on Douglas E. Scates, Director, School Research, Cincin- 


4 ’ nati Public Schools, Cincinnati, Ohio. 
Paul H. Furfey, Professor of Psychology, Catholic Uni- : ; an, Same : 
versity of America, Washington, D. C. Raleigh Schorling, Professor of Education, Supervisor, 
Directed Teaching and Instruction in University High 
School, University of Michigan, Ann Arbor, Michigan. 


llege, 


Clarence E. Ragsdale, Assistant Professor of Education, 
University of Wisconsin, Madison, Wisconsin 


Earl U. Rugg, Head, Department of Education, Colo- 
rado State Teachers College, Greeley, Colorado 


Peter Sandiford, Professor of Educational Psychology, 
Director, Educational Research, University of Tor- 
onto, Toronto, Canada. 


» 


Florence L. Goodenough, Professor, Institute of Child 
Welfare, University of Minnesota, Minneapolis, Minn. 


M. E. Herriott, assistant director Division of Psychol- Mandel Sherman, Associate Professor of Education, 
ogy and Educational Research, Public Schools, Los University of Chicago, Chicago, Illinois. 


Angeles, California. Helen Thompson, Research Associate, Yale University, 
Karl J. Holzinger, Professor of Education, University New Haven, Connecticut. 

of Chicago, Chicago, Illinois. we 
L. Thomas Hopkins, Curriculum Specialist, Lincoln — ao. ee See ees, Sath 
: ’ S ’ i i ity. Colum- 

School of Teachers College, 425 West 123rd Street, oo — a ee Serer Oe 

New York City. : 


C. L. Huffaker, Professor of Education, Bureau of Edu- T. L. Tengeanem, Assistant Professor of Education, Un 
cational Research, University of Oregon, Eugene, Ore. versity of Wisconsin, Madison, Wisconsin. 

Kai Jensen, Assistant Professor of Education, Univer- M. R. Trabue, 603 East Franklin Street, Chapel Hi! 
sity of Wisconsin, Madison, Wisconsin. North Carolina. 

Harold E. Jones, Director of Research, Institute of R. W. Tyler, Professor of Education. Ohio State Uni- 
Child Welfare, University of California, Berkeley, a. 

Edward A. Lincoln, Assistant Professor of Education, Douglas Waples, Graduate Library School, University 
Graduate School of Education, Harvard University, of Chicago, Chicago, Illinois. 
Cambridge, Massachusetts. Beth L. Weliman, Rescarch Associate Professor, Child 

FE. F. Lindquist, Professor of Education, State Univer- Welfare Research Station, State University of low4. 
sity of Iowa, Iowa City, Iowa Towa City, Iowa. 











ur 





EXPERIMENTAL EDUCATION 


Volume II 


te JOURNAL” 


Marcu, 1934 


EDUCATIONAL MEASUREMENTS 


CONTENTS 
The Derivation of Norms: S. A. Courtis 237 
Some Assumptions Involved in Personality Measurement 

GORDON HENDRICKSON 243 
The Economical Collection of Data for Test lalidation: Pavi. Hors 250 
Item Analysis by the Method of Successive Residuals: Paci. Hors 254 
Improved Overlapping Methods for Determining | alidities of Test Items 

Joun A. Lone 264 
The Negative Suggestion Effect of the False Statement in the True-False 

Test: Howarp Y. McCiusky 269 
A Graphical Method for Computing the Standard Error of Biserial r 

WALTER J. McNamara and Jack W. DuNLAaAp 274 
A New Technique for Machine Computation of Coefficients of Correlation 

Marc J. FELDSTEIN 278 
The Construction and Interpretation of Differential Ability Patterns: 

Davin SEGEL 283 
The Insignificance of Significant Differences: Knowarn A, LINCOLN 288 
Predicting Stabilized Salary-Schedule Costs: DovGias FE. Scares 20] 
“Perseveration” in a Group of Subnorimal Children: K. H. RoGers 30] 


An Analysis of the Scores of Etghth-Grade Pupils and Normal-School Stu- 
dents on Certain Objective Tests: C. C. UpsuHaci and Harry V. Masters 310 




















THE DERIVATION OF NORMS? 


by 


S. A. Courtis 
University of Michigan 


That measurement plays a fundamental 
part in scientific investigation is ad- 
mitted by all, but strangely enough, mere 
measurement yields but barren facts. of 
what significance is it that a child makes 
a score of ten in a spelling test? Measure- 
ments take on meaning only as they are re- 
ferred to norms. Thus, if the child is a 
boy six years old, and a score of ten in the 
particular test used is the median score for 
boys six years old, and further if there 
are no known distorting factors such as 
special training to be allowed for, we judge 
the boy to be making normal progress in 
spelling. 

It follows, from the ideas presented 
above, that the meanings attached to meas- 
urements are no more valid than the meas- 
urements and the norms from which they are 
derived, Thus, if the boy's ability was 
really twenty, but the breaking of his pen- 
cil in the middle of the test reduced his 
score to ten, the judgment of “normal prog- 
ress" is invalid. Similarly, if the norm of 
ten was based on the measurement of superior 
children only, "normal" progress must be 
changed to "superior" progress. There is no 
need to press the point; it is self-evident, 
but it is frequently disregarded. 

The norms most frequently used in the 
measurement and interpretation of growth are 
central tendencies of successive age groups. 
Thus, Baldwin gives the mean heights and 
weights of boys and girls at various ages.* 
But if we compare the growth curve of an indi- 
vidual® in height with Baldwin's Norms (Fig- 
ure 1), it is apparent at once that the in- 
dividual curve in form differs 1. rkedly from 
Baldwin's curve of means. That this differ- 





isdividws! 
Case & 625 
Serveréd Growth Steady 


Greap Car ves 
9262 Beys 


U.S., Itely, Eagiend, 
Switsertaad, Scetiend 








120 i140 
Age ia Beaths 


Figure 1. Comparison of Group 
Means with Measurements of an Indi- 
vidual. 


ence is in no way the fault of Baldwin's 
norms is proved by the inclusion in the fig- 
ure of a second curve of means from an in- 
dependent investigation. At no point do 
the means differ more than a third of one 
standard deviation. To be sure, the indi- 
vidual curve was selected to illustrate in 
extreme form a common tendency in individu- 
al growth curves to vary from the mean, but 
the character of the variation is truly 
typical. Considerations such as these ulti- 
mately lead one to ask, "Does a growth 
curve based on means of successive age 
groups reveal most truly the nature and 
course of the growth process?” 

It may help to make the problem clear 
if the individual growth curve shown in 





l. A paper read before Section Q, Education, of the American Association for the Advancement of Science at Atlantic 
City, December 29, 1952. 


2. Bird T, Baldwin, The Physical Growth of Children from Birth to Maturity. 


University of Iowa Studies in Child Welfare, 





No. 1. Iowa City, Lowa: 


Child Welfare Kesearch Station, University of Iowa. 


5. Throughout this report, use will be made of data from the Harvard Growth Study. My very great indebtedness to Drs. 
Psyche Cattell and Walter Dearborn for making these data available for my study is gratefully acknowledged. 


4. S. A. Courtis, International Comparisons in Child Development. 


(In preparation for publication.) 























238 JOURNAL OF EY°ERIMENTAL EDUCATION 


Figure 1 is analyzed into component factors. 
The discovery of the fundamental law of 
growth! makes such analysis possible. 

The growth of human beings between the 
ages of six and eighteen is a two-cycle af- 
fair (Figure 2). The first cycle will be 



















lactose 
_— 
leectrosic Bqestion Seceead Cycle 
y+ 8s {o.cerse+ 3.19] Adolescence 
ee; 14 {o.4a2¢ - 30.39 isechronic Equet 
y= i4 {o.432t - 38. 

oO 7 5.0. Bot. 1.528 
Dillerences 

ey age laches 
2 3.06 

Piret Cycle 
0.13 

=~ ase -0.00 Infescy 
oa leechresic Equatios 

o =. go y «66 fo.ce78¢ +3.79) 
i174 0.38 
1s e.0 

. 

ae 7 - , 

ct be 140 ise ne 


Figure 2. Analysis of Individual 
Growth Curve in Height. Case M 825. 
Harvard Growth Study. 


called the growth of "infancy" and the sec- 
ond the growth of "adolescence." For in- 
stance, during infancy the individual shown 
in Figure 1 was growing towards a maximum of 
56 inches and the data (except for the ini- 
tial measurement) may be rather precisely 
represented by the isochronic equation? 


y = 56 ¢ 0.4575t + 3.79] 
Median deviation, +0.07 inches 


At about 89 months, a second cycle of growth 
begins to make itself evident. The iso- 
chronic equation for this cycle is 
y = 14 ¢ 0,4319t - 38.39] 
Median deviation, +0.31 


The equation for the entire curve is the sum 
of these two equations; namely 








Volume II, No. 2 
y = 56 + 0.4575t + 3.79] 
+ 14 [0.4319t - 38.39] 
Median deviation, +0.25 


The standard error of estimate of this 
equation is 1.125 inches, the largest of 
any equation for individual curves used in 
this report. The average standard error of 
estimate of twenty-five curves covering a 
nine-year period of re-measurement was 
0.44 inches, 

In addition to "time", the significant 
factors in any growth curve are three in 
number: (1) the maximum toward which the 
growth proceeds; (2) the rate of growth; 
and (3) the degree of development at the 
beginning of the growth cycle. The fact 
that the human growth curve, however, is a 
two-cycle curve during the period being 
studied introduces a new and an important 
factor, the time when adolescent growth be- 
gins (Figure 3). 


laches 







70> = Times of insing of gothen a oe eee 
Rastoscent Growth an et 
. ca 

“4 Cocve = Aas an - ond 

A 113 e , 

5 rd ae 

c oe 4 4 c 
oe D a - ¢ Pi 

e Y v 
e s yh 


ee 


Equations fer Curves 
- y ate fo.ss781 +3.79) +14 {o.432t - 48.78] 


- y 86 $o.4578¢ +3.79)] +146 {0.4928 - 38.29) 
- y=t {o.csrse +3.79] +14 {e-432t - 28.03] 
.  y be Fo.45780 43.79] 414 [0.4928 - 17.67) 








oo we 100 120 140 16 180 200 220 
Age is Keaths 


Figure 3. Effect Upon Form of 
Curve of Time when Adolescent Growth 
Begins. Curve B represents Case M 825, 
Harvard Growth Study. 


In the equation for the growth curve as a 
whole, the time of the beginning of the sec- 
ond cycle is represented by the last con- 
stant in the equation. Hence in computatiom 
this factor can be varied at will. The 
variational index curves A, B, C, D, in 





1. S. A. Courtis, "Maturation Units for the Measurement of Growth", School and Society, November 1929. 





2. A copy of a table of isochrons and instructions for computing an isochronic equation for an individual curve for 
height, weight, etc., will be sent to any reader of this magazine in return for 10¢ in stamps. Address the author 
at the University of Michigan, Ann Arbor, Michigan. 


ao S&S a eft 








varch, 1934 


rigure 3 in which this constant is given the 
yalues, 48.75, 38.39, 28.03, and 17.17 in 
turn are therefore the effects produced by 
yariation in this one factor only. They ac- 
curately portray patterns of growth actually 
found in different individuals. No one 
would suspect that curve D is a complex 
curve, but it is. 

In Figure 3, Curve B is the same curve 
shown in Figures 1 and 2 and represents ac- 
tual measurements. Curve A illustrates what 
happens when adolescence is so delayed that 
the first cycle of growth is almost complet- 
ed before the second begins. Curve D repre- 
sents the opposite extreme. Adolescent 
crowth began long before the growth of in- 
fancy had reached its maximum. In the 
twenty-five individuals used in this investi- 
gation, the age at which adolescent growth 
began! varied from 23 months to 117 months, 
the mean age being 73 months. Therefore, in 
using age means as norms, these evidences of 
the existence of two cycles of growth tend 
to “average out,” The result is that norms 
based on means, for examples the norms shown 
in Figure 1, give very little hint of the 
real nature of the growth process in the in- 
dividual. 

Keeping these facts in mind, let us 
turn now to a consideration of some of the 
different methods by which norms are, or 
might be, obtained (Figure 4). One conven- 
tional method-is to measure at a given time 








(Carve ef Coustests) 





Remesserement of 





S. A. Courtis 


2. Conventions! 


Figure 4. 


| et istervels during 


IDENTICAL INDIVIDUALS 





entire grewth peried 





Growth curve besed oa 
BEANS ef CONSTANTS 


ef individesl carves 





(Carve of Besas) 





4 


Simulteseeus Keesuremest 

of successive, comperabie 

GROUPS ef DIFFERENT 
INDIVIDUALS 





Direst Bethed 
Grewth carve besed on 


BEANS ef VALUES 








at ceoch intervel 








Obtaining Norms. 


Different Methods of 





all the subjects available, classifying 
their measurements in age groups, and find 
ing the mean value for each age. This meth 
od assumes that the individuals in each age 
group are what the individuals in the pre- 
ceding groups will become in time, and ig- 
nores the factor of selection, which how- 
ever is constantly operating and may dis- 
tort the mean values, 

A better method would be to remeasure 
a given set of individuals at regular in- 
tervals during the entire period of growth 
and then average the results. This is what 
the Harvard Growth Study is attempting to 
do. The time and care required to stand- 
ardize measurements over a long period of 
time, however, is one serious obstacle which 
renders the method impractical for general 
use. 

Even with a series of nine annual meas- 
urements for identical individuals in hand, 
the question still arises, "How shall norms 
be derived?" For a curve based on means 
might still fail to portray the essential 
characteristics of the growth process? An 
alternative method suggested itself. One 
might average the "constants" of a series 
of individual growth curves and build a 
composite growth curve from the results. 
Such a curve will be called a "Curve of 
Constants." 

The purpose of the investigation being 
reported is to compare the "Curve of Means" 
with the "Curve of Constants” in order to 
determine which one reveals most clearly the 
general pattern of the growth curve. Twenty- 
five individual records over a nine-year pe- 
riod® were selected for detailed study. The 
number was set at twenty-five solely be- 
cause of the time, and money, costs in- 
volved. The bases of selection were (1) that 
the individuals should come from the same 
racial stock (the parents of all twenty-five 
were born in Italy), and (2) that the 
growths should be sufficiently regular to 
yield equations having small standard errors 
of estimate. The actual measurements used 
were height, weight, and number of perma- 
nent teeth cut. Measurements of intelli- 
gence were also available. 





1. Adolescent growth is considered to begin when its contribution to the total growth amounts to 0.000 000 189 per cent 
of its own maximum. 


2. Seven years in two cases. 



































































The specific plan of procedure was as 
follows: 


Step 1. An equation for the growth of seach 
individual in each trait was derived 
from the actual measurements. 

Step 2. These equations were solved for val- 
ues at given exact ages. 

Step 3. Means of the values at each age were 
found for each trait. (Curve of 
Means ) 

Step 4. For each trait also, a mean value 
was found for each constant in the 
equations by averaging the value of 
corresponding constants in the dif- 
ferent individual equations, 

Step 5. For each trait, an equation was con- 
structed from the mean constants and 
a growth curve computed. (Curve of 
Constants) 

Step 6. The Curve of Means was compared with 

the Curve of Constants. 





The various steps in the process will 
be illustrated. The first graph (Figure 5) 
presents the three growth curves for one 


oe & y =142 fo.set + 3.4}+35.8 fp.a -38.4 
ve, 7 =42-0 f0.262¢ +12.30}+ 28 fe-734 - 6.8] 

















e 
it. y =24 £1,067 +63.+ 14 fi.19t - 116.6] Sees. vt 
-~ - 
oe ° 
= Steadard Error of Estinste Po 4 = 
mop 82.060 1.0 Tete ¢ 
p / 
. 1 , 
o a 4 
a 4 / 
s 
a] a on “ 
- nat ¢ 
— - 
ve. - “J 
@} wr a” 
Or Z - 
= Pa oe” 1.4. 
= e ». er 19 Intelligence Tests 87 - 108 
r a 7 Teeth a2 ~ 104 
» -* Bees 
- » Tests 98.8 
“~~ Teeth 96.0 
ds! -_ ~ _ . ~ - 
oe 100 10 1” 12” 14e@ 168 160 178 180 190 


Age te Beaths 


Figure 5. Illustration. Step l. 
Derive Equations for Each Individual 
Growth Curve. Case M 825. Harvard 
Growth Study. 


individual and the equations derived from 
the data. Seventy-five such equations were 
obtained from the data on which the study is 
based. 





JOURNAL OF EXPERIMENTAL EDUCATION 








Volume II, No, 2 


A question immediately arises, "How 
adequately do such equations describe the 
individual growth process?” The answer is 
found in a study of the deviations of the 
computed from the actual values. These dre 
shown in full for height, and by summary 
statements for weight and teeth (Figure 6), 








Prequency SsEiaaT 
1504 
Bt. Wt. Teeth 
n004 Zere errers 117 thn 109 
Bese error (sign) 07 «4.10 Tete! 
Meas error (without sign) .60 .76 .37 | an Conger tecas 


Stender4 Errer of Estimate .55 . 





Figure 6. Frequency of Errors, 
Computed Values--Actual Measurements. 


The total number of items of data in each 
trait was 223, Approximately half the er- 
rors were zero or less than one-tenth of 
the unit of measurement (shown in black). 
One hundred of these zero errors are 
"forced" errors because two points in each 
cycle of each curve are used as bases in 
obtaining the equation. The remaining er- 
rors are for the most part within half a 
unit of the mean. The errors are slightly 
skewed, probably partly because of a tenden- 
cy to under-estimate the maxima, and partly 
because of the errors in the conditions of 
measurement themselves, 

It will come as a surprise to many, 
that human growth is sufficiently regular 
to be so precisely represented by mathemat- 
ical equations. The reason, however, is 
that in the past faulty units have been 
used. The new isochronic units based on the 
law of maturation reveal a constancy in all 
biologic growth hitherto unsuspected. 

By means of the equations, the values 
for each individual at given ages were com- 
puted, 375 items in all (Figure 7 on the 
following page). The various distributions 
are shown in full and the curve of mean 





1. Mote that in this figure the maximum for height and the standard error of estimate are both expressed in centimeters. 





ie 


Height 


leche 378 items, 15 age aeens 


4 yeu dees ile, feet 


2 2 
—--—@ 8 
& 5 





3 / 8 - Individest = 825 


C + Individwst uw 467 





7 oF 

if 
2s $4, ___, PL A Ee FE A SE BEE gE SE 
Bean M2 42.2 4.2 47.7 47 SAT SLT 56.9 SA) 601 G15 O26 63.3 OLA 64.1 








a ee 


63°75 87 99 ‘tht 623-135 147 159 1M 883 198 207) 219°231— 
Age te Beaths 


Figure 7. Steps 2 and 3. Compu- 
tations of Values from Individual Equa- 
tions. 


values has been drawn over them. Also in 
the figure, the computed growth curves of 
the shortest and the tallest individuals at 
the age of 63 months have been drawn to il- 
lustrate the effect of such factors as start- 
ing points and maxima (represented by "i" 
and "k" in the equations). Individual C has 


VYarch, 1934 S. A. Courtis 241 


Generel! lsechresic Equation, y = ‘, qe, a ‘] + «ft + ‘] 
“eb”, Pelet takes as begiasiag of grewth, (14 c= &) 


a pene Gis lefeacy Adolescence i 
oy Me. So SIS 2 | Me [se 
C 1 B 444 Bey| 60 ©.2600 86.9 il 0.5808 57.7 | @.4918) 94.2 


2 B 963 Bey) 63 6.2736 93.5 1@ §=6©6.2963 46.2 | 6.0904) 89.2 
3 RB 825 Bey) 66 6.4575 8.3 1 ©.4319 87.9) 1.1280) 96.8 








26 2 B3A Girl) 66 6.2773 90.4 i 6.3326 61.7 | 0.1996) 90.6 











Rese Values 64.72 0.2701 99.5 0.6 ©.4649 172.8) @.4414) 94.1 


Correspeadiag Equation: 
y= 84.72 {o.27010 + 26.9] + 10.5 {0.48491 —a3.0] 


Figure 8. Steps 4 and 5. Compu- 
tation of "CURVE OF CONSTANTS"--Height. 


This curve will be called the "Curve of 
Constants." 

Finally the Curve of Means for each 
trait was compared with the Curve of Con- 
stants (Figures 9, 10, 11). In each case 





laches 
a high starting point and a low maximum; In- ™ — 
dividual B a low starting point and a high al 


maximum. The curve of means is their approx 
imate average but the distinctive features 

of each individual curve are lost in the 
average. 

In passing, it is interesting to note 
that at 90 months, the two individuals both 
are approximately “average” or "normal" in 
height. The conclusion to be drawn from 
such data is that one cannot be sure from 
single measurements of two growing individu- 
als that their measurements at any other 
time, past or future, will maintain an ob- 
served relationship. The significance of 
this conclusion for "matching" in controlled 
experimentation is self-evident. 

Steps four and five of the plan of pro- 
cedure are shown in Figure 8. The twenty- 
five values for each constant for each trait 
were arrayed in tabular form and mean values 
found. From these mean values an equation 
was synthesized and a growth curve computed. 


C Carve of Coastests 
BR Carve of Beans 
ss 4 Bese Difference 







Equstios 
Carve of Constants 


y = 64.72 fo.270 + 26.9 + 10.8 fo.est - 3.3 








a . v . a se se ny 
& Lad 100 120 1 1 100 20@ 220 
Age ie Benths 


Figure 9. Comparison of Curve of 
Means with Curve of Constants. Height. 


it was found that the curve of constants in- 
dicated a greater development at all ages 
than the curve of means, and also that in 
form it portrays the two cycle growth tak- 
ing place better than the curve of means. 
The significance of the last general- 
ization may not be evident on first reading 











because we have so few measurements of 





1, This "averaging" of constants is frankly an expedient used in the absence of knowledge of the relationships which ob- 


tain between the constants. 
for the curve of constants. 


Investigation of these relationships is under way. 


No claim of basic validity is made 





a mr ee a nen meen nn 


. 


| 

| 
a 
ot 
| 


ene 


meee Nea 














ad 
i i 
; 

te 

: 

= I 

1 
| 


JOURNAL OF EXPERIMENTAL EDUCATION 


C = Carwe of Comstents 
a= Carve of Beans 


Been Dilleresce - 6.9 ibe. 


Equation 
Carve of Cometants 


y = 172.63 {o.s2s¢ + 16.44] + 62.92f0.471 - 31.34 








Figure 10. Comparison of Curve 
of Means with Curve of Constants. 
Weight. 





C= Carve of Coustests 

B@ Carve of Beeas 

Bees Difference + 1.08 
teeth 


Equation 
Carve of Constests 


y= 13.7 {o.oiee ~ 29.i] 6 14.3 {1.000 - 106.1] 








Figure 11. Comparison of Curve 
of Means with Curve of Constants. 
Teeth. 


growth in educational fields, and because 

those we do have, are usually referred to 
norms which are only group means. As data 
accumulate, however, it is becoming clear 





Volume II, No, 2 
that mental and educational development are 
also affected by the cycles of growth in 

the individual (Figure 12). This suggests 


Steateré Achievement Tests Peet-Dearbers Arithmetic 
Case 063 Case 1538 


se 
Tete; Mise A.L.Rese Berverd Growth Study = gi cie) 
Score Score Age 


q 











a + - ® 
127006«(6138) «38 





— 


160 60 «8098108 110 122 130 9 110 130 150 17% 190 
Age is Beethe 


Figure 12. Illustrations of Other 
Two-Cycle Curves from Repeated Testing 
of Individuals. 


that further bases of interpretation are 
necessary. It suggests further the need to 
re-examine some of our fundamental concepts, 
such as mental age and determine by experi- 
mental trial whether or not other methods 
of arriving at norms may not lead to chang- 
es in fact and in interpretation. 

The writer's conclusion from this 
study is that the concept of "norms" is in 
need of revision. It is easy to prove that 
at present gross injustice is done many 
children by comparing their scores with 
measures of central tendency which take no 
account of the different stages in the de- 
velopmental process which the scores repre- 
sent. This applies to all norms--physical, 
mental, and educational. In the future 
more attention than at present will need to 
be given to determining the individual pat- 
tern of growth and to interpreting scores 
in terms of that pattern. To this end it 
is essential that school records of tests 
and measurement be so made and kept as to 
supply data for studying the individual 
growth curves. 








yvarch, 1934 


SOME ASSUMPTIONS INVOLVED IN PERSONALITY MEASUREMENT 


by 


Gordon Hendrickson 
University of Cincinnati 


A recent statement by Dr. Mark A. May 
challenges workers in the field of personali- 
ty measurement. Professor May, who is doubt- 
less aS competent a judge as anyone, says, 
"The measurement of personality today is 
about as far advanced as the measurement of 
intelligence twenty-five years ago" (7). 
Perhaps so. But if personality measurement 
has not advanced far, at least it has spread 
widely. A quarter of a century ago, Binet's 
first mental age scale had just made its 
appearance. Today in the area of personali- 
ty and character there are scores of pub- 
lished measuring instruments and bibliogra- 
phies listing a thousand alleged research 
studies (3, 6, 8, 9, 11, 12). 

In the early days of intelligence test- 
ing Alfred Binet, at least, gave the move- 
ment competent leadership. Its scope was 
limited, but it knew where it was going. To- 
day a would-be leader in personality meas- 
urement must consider proposals that differ 
widely if not weirdly, including the psycho- 
galvanic reflex, questionnaires as to neu- 
rotic symptoms, the ability of children to 
stop reading a thrilling story when the pag- 
es are found sealed at a crucial point, rat- 
ings by friends of the subject, free associ- 
ations to verbal stimuli, and dozens of oth- 
er assorted devices for getting at that 
curiously subtle something referred to as 
personality. Under the title, “Diagnosing 
Personality and Conduct", Symonds (9) dis- 
cusses the following approaches, among oth- 
ers: observation and rating; adjustment, 
attitude, and interest questionnaires; tests 
of conduct, knowledge, and judgment; per- 
formance tests; the free association method; 
various measures of emotions; interviews; 
and psychoanalysis. The area explored by 
these techniques is so vast that it can hard 
ly be staked off except in negative terms, 
as by saying that the words "personality and 
character" will cover those aspects of human 





nature not measured by intelligence, apti- 
tude, and achievement tests. 

In view of the enormous complexity of 
this alluring field for research, many of 
the ambitious students who have signified 
their willingness to be pathfinders have 
taken much for granted. Testing has come 
before analysis. Interpretation has been 
left for others than the examiner. Diffi- 
culties met have not always been stated. 
Hopefulness has substituted for knowledge. 
Publication has been a preeminent goal. 

If these strictures are warranted, it 
should not be surprising to find that ex- 
plicit declarations as to what has been as- 
sumed in the preparation of particular in- 
struments are rare. One of the clearest of 
such analyses, a statement which might well 
serve as a model, has been made by Thurstone 
and Chave (10), reporting the development of 
scales for measuring attitudes. These writ- 
ers are explicit as to certain limitations 
of their work: they seek to measure only 
one attitude variable at a time, and use 
opinions as indirect indices of attitudes; 
they do not try to predict overt conduct, 
and anticipate changes in attitudes; they 
assume a conceptual linear continuum for the 
variable measured; and they expect their 
scales to be used only when people may rea- 
sonably be expected to tell the truth. 

The present paper is restricted to a 
few problems involved in practically all at- 
tempts to measure in the area of personality 
and character. No effort at exhaustive 
treatment will be made. The references will 
lead the interested reader to more compre- 
hensive surveys of the various topics. 

Three basic problems, which may be 
stated briefly in question form, will serve 
as focal points for this discussion. (1) What 
is the nature of personality? (2) How sta- 
ble is personality? (3) What social factors 
are involved in personality measurement? 











2A4 


The Nature of Personality 

Out of many possible answers to the 
first question, three may be selected for 
emphasis. (See Allport, 1, and Allport 
and Vernon, 2, for further alternatives.) 
A fourth possibility, that of eclecticism, 
will be noted more briefly. 

(a) Personality may be regarded as an 
unanalyzable totality. According to this 
assumption, the personality of each indi- 
vidual is an essentially unique pattern. 
Analysis is supposed to destroy the pattern, 
to be misleading and dangerous. Apparently 
this position denies that personality can 
in any true sense be measured at all. Even 
this view, however, may lead to certain ap- 
proaches to measurement or description. 

First, personalities may be portrayed, 
as in case studies. Just as an artist por- 
trays a face, not as a collection of unre- 
lated details, but as a total pattern in- 
volving many reiationships, so a word por- 
trait of an individual may be drawn up. In 


the Character Education Inquiry one hundred 


such verbal pictures were prepared of the 
behavior of particular children with refer- 
ence to moral stimuli. These were then as- 


sembled in linear sequence, to furnish a 
rating scale against which to check the be- 


havior of any given child. Here the very 
patterning of behavior has been made _ the 
basis of a kind of measurement. (For ref- 
erences to methodology in case studies, see 
Young, 13.) 

Second, May has proposed that an indi- 


vidual's personality be regarded as his "so- 


cial stimlus value", that is, as "the way 
in which he impresses others" (7). If this 
position is adopted, it is not necessary to 
analyze the behavior of the individual to 

be measured, 
behavior of other people with reference to 
him, "The individual who has zero person- 


JOURNAL OF EXPERIMENTAL EDUCATION 


Instead, one would analyze the 


Volume II, No, 3 


the quantity of response evoked. How sig¢- 
nificant this factor may be is not s80 obvi- 
ous. Consider Gandhi, Napoleon, and Lincoln: 
if the totals of the positive and negative 
reactions of those whose lives they have 
touched could be determined, what would the 
result signify? In the Metropolitan Musewm 
of Art two of the largest and best known 
pictures are those of "The Horse Fair" and 
"Washington Crossing the Delaware." Could 
these be compared in any important sense ir 
terms of a summation of the reactions of the 
crowds who pass them? 

Finally, the assumption of an unanalyz- 
able personality may lead to the mode of 
psychological speculation and investigation 
known as “characterology” or "typology", 
found especially in contemporary German 
psychology (6). While aimed at the inter- 
pretation of personality and the classifica- 
tion of individuals, this last approach is 
hardly measurement in the ordinary sense, 

(b) A second assumption comes with the 
weight of very considerable authority. It 
is that personality is an aggregate of 
isolable traits or factors. The clearest 
expression of this view is in Kelley's work, 
Crossroads in the Mind of Man (4). Kelley 
declares that each school of psychology 
postulates certain psychological elements 
as entities, entitled to an independent 
status in the field of mental life, and 
that these concepts, to be given serious 
consideration, must reveal themselves in 
terms of measurable differences in conduct. 
Kelley's method for determining whether 
there actually exist distinct entities cor- 
responding to the alleged traits, factors, 
or concepts, is a modification and extensior 
of Spearman's tetrad difference technique. 
In his book Kelley analyzes data from 4 
variety of tests given at several levels of 
the public school, with a view to determin- 











ality (if there be such a person) is one who 
makes no impression on anyone.....At the 
other end of the scale is the individual who 
as stimulus produces in others very intense 
and vivid reponses. Thus the measure of a 
personality is found by counting and measur- 
ing the intensity of the responses produced 
in those with whom it comes in contact" (7). 
This interesting suggestion clearly 
calls for measurement of a single variable— 


ing "independent mental traits" at each 
level. The data are restricted to intellec- 
tual functions, but the technique could un- 
questionably be extended to the study of 
personality characteristics. A final sum 
mary of results for the kindergarten group 
is as follows: 

"That the following traits, (1) facil- 
ity with verbal material, (2) manipulation 
of spatial relationships, (3) memory, are 














‘= 





Yarch, 1934 


independent categories of mental life froma 
yery early age (probably from birth) seems 
nardly open to question. The sensing and 
retention of geometric forms should probably 
also be included in this list, also a number 
ractor, because of its indubitable presence 
in the third- and seventh-grade populations, 
for it should be recalled that no test for 
this factor was made in the case of the 
kindergarten group” (4). 

From Kelley's discussion of his results 
it is clear that he is anticipating the 
analysis of mind into a fairly small number 
of distinct, non-overlapping traits or fac- 
tors, each of which may ultimately be meas- 
red, Elsewhere he suggests that super- 
traits or type-traits may be discovered (5). 

The results in the illustration just 
*iven seem disappointing, however. Somehow 
r other out of the test data has come an 
array of factors strangely reminiscent of 
the mental faculties once so popular among 
formal disciplinarians and phrenologists. 

It appears that the statistical treatment 
largely reveals in the data what was put 
into them by the selection of tests. Certain 
of the abilities differentiated out, those 
for language and number, relate to particu- 
lar social institutions which command large 
portions of the time of children, and which 
are quite certainly not innate or in any 
final sense elemental in mental life. In 
other words, the outcome of the analysis de- 
pends upon the particular training of the 
subjects and the selection of testing in- 
struments, 

The value of the statistical conception 
of isolated traits for the selection of non- 
duplicating tests can hardly be doubted. But 
that such an analysis is capable of reveal- 
ing the most important facts about personal- 
ity and ultimately of describing it entirely 
in quantitative terms, must at present be 
regarded merely as a challenging assumption. 
(For other trait theories, see references 1 
and 2.) 

(c) Quite different is the assumption 
that personality has characteristics, oor 
variables, which are analyzable for conven- 
lence and as a matter of social utility 
rather than of psychic independence. By 
this view, definition and delimitation of 
these variables depends upon the purpose of 

















| 








Gordon Hendrickson 245 


the test maker, Trait names thus merely 
summarize behavior which society, or the 
personality tester, has determined shall be 
Classified under them. Once such a differ- 
entiation of a trait (variable, area, char- 
acteristic) has been datermined upon, meas- 
urement is a matter of sampling the behavior 
in question. 

This assumption seems to open the way 
for an infinite number of personality tests, 
overlapping in greater or less degree, and 
each predicated upon some tester's special 
classification of behavior. Certainly this 
is a possible outcome, In practice, how- 
ever, considerable progress has already 
been made towards agreement as to many 
areas or aspects of personality. Neurotic 
tendencies, introversion-extroversion (de- 
spite severe strictures by Kelley and Wat- 
son upon this concept), and honesty may 
serve as illustrative terms. No one could 
claim that the various measures now availa- 
ble for each of these traits are identical 
as to range of the behavior sampled. The 
fact is that these concepts have had a so- 
cial evolution, and that the proposed tests 
reflect the various meanings given them by 
those thinkers who have used the terms, If 
society finds it worthwhile to classify 
certain forms of behavior as neurotic, in- 
troverted, or honest, it seems perfectly 
possible to follow its lead and measure 
samples of such behavior. 

This view is probably implicit in many 
of the measures now available. Thurstone's 
scales of attitudes, the questionnaires de- 
Signed to reveal introversion-extroversion, 
most or all of the many rating schemes, the 
tests of verbal and non-verbal behavior de- 
veloped by the Character Education Inquiry, 
and indeed nearly all so-called character 
and personality measures can be described 
as approaches to aspects of personality 
which have logical and practical differen- 
tiation rather than psychological inde- 
pendence, 

It would seem the part of common sense 
to select for measurement variables which 
overlap as little as possible, and which are 
in each instance so defined as to give a 
basis for a linear continuum of values. 
Statistical techniques such as those de- 
veloped by Kelley have an obvious function 











- 


NE A DEI lS <a 


. 
een eee 


246 JOURNAL OF EXPERIMENTAL EDUCATION 


in this regard. Incidentally, the growing 
practice of naming traits and tests in 
double-barrelled fashion (introversion-ex- 
troversion, ascendancy-submission, self- 
sufficiency-dependence) implies in each in- 
stance a linear scale, by the very designa- 
tion of its extremes. 

If this point of view is adopted, it 
becomes permissible and even desirable to 
follow the lead of popular terminology, to 
bring together samples of behavior which 
are classified under some socially developed 
concept, or which may be classified together 
to meet some socially developed need (for 
example, in connection with vocational guid- 
ance), and finally to use such samples of 
behavior in deriving scores to be arranged 
in a linear fashion with reference to the 
variable in question, 

(d) A fourth assumption is the amiable 
one of eclecticism, that any of the first 
three may be followed, the choice depending 
upon one's problem. Perhaps an individual's 
personality does have a total pattern, plus 
certain characteristics which may be isolat- 
ed from each other for large groups of per- 
sons by statistical analysis, plus still 
more characteristics which are analyzable 
for convenience and with regard to social 
needs rather than psychological coherence, 
It is quite possible to think of a personal- 
ity measurement program which utilizes the 
portrait technique, May's suggestion that we 
determine the reactions of other individuals 
to the person measured, a limited group ot 
tests for such traits as seem to be separa- 
ble by Kelley's technique, and a wider group 
of questionnaires, rating scales, and be- 
havior tests which answer questions of so- 
cial importance regarding the individual to 
be measured, 


The Stability of Personality 

The second major question, "How stable 
is personality?” appears to be one for in- 
vestigation rather than a priori assumption. 
The interpretation of personality measure- 
ments, however, has frequently proceeded as 
if certain assumptions could safely be made 
in answer to this question. Certain tests, 
such as the Downey Will-Temperament, have 
seemed to take for granted a permanency in 
non-intellectual characteristics. Others, 











Volume II, No, 3 


y 
notably in the field of attitudes, have 
been prepared with the expectation that the n 
variables measured can be changed by direct- a 


ed teaching. In view of such divergent con- 
ceptions, it may be desirable to develop a 
general point of view as to this issue, 

A point of departure may be our knowl- 
edge as to the consistency of intelligence 
test results. Many experiments in retest- 
ing for intellectual capacity have led to 
an expectation of relative consistency in 
behavior over a period of years. When we 
find occasional striking gains or losses in 
I.Q., we tend to look for the explanation 
in terms of the inadequacy of the tests 
rather than in terms of instability in hu- 
man nature. In other words, it is general- 
ly assumed that something basic in native 
capacity, unchanged by training, does exist 
and to some degree may be inferred from in- 
telligence test results. 

. A transfer of this attitude to the area 
of personality testing is but natural. But 
do the facts warrant this sort of expecta- 
tion? It is at least possible that  per- 
sonality may alter, may grow, may deteri- 
orate, or improve, whatever view is held of 
its ultimate nature. To some degree, such 
changes may be anticipated regardless of in- 
tentional provisions for bringing them about, 
The ordinary play of material and social 
forces upon the individual is enough to re- 
sult in marked changes in many persons, The 
shy child of the first day of school may 
soon lead the class inanoisy game; the cal- 
low youth becomes the staid judge; the 
hoydenish tomboy of today will be the 
dowager of fifty years hence, 

More than this, we consciously work for 
modifications in personality. Every psycho- 
logical clinic, every visiting teacher, every 
vocational guidance counselor, every class- 
room teacher who is sensitive to the nu- 
ances of individuality strives purposely to 
overcome the fears and inhibitions of the 
retiring child, to develop acceptance of 
responsibility in the rebellious youngster, 
in short, to build and rebuild personality. 
The case study literature is full of tales 
of transformed personality. Here change — 


fm ws FY & ee & 


tends to be accepted as normal rather than 
as a function of the measurement technique. 
that 


In other words, we generally assume 








March, 1934 


much of what we call personality is change- 
able, capable of transformation. 

This is not to say that there is no 
central unity to a personality. Continuity 
is as much a fact in growth and life as is 
change. But the emphasis should probably 
be upon the potentialities of human nature, 
in the aspects under consideration, for 
changé. 

What is the significance of this em 
phasis? Three corollaries seem to hold. 

(a) First, the possibility of predic- 
tions based upon personality measurements 
must be determined by judging each instance 
upon its own merits. Sometimes prediction 
will be possible, sometimes not. To be 
precise, in order to state the probable ac- 
curacy of predictions based upon a personal- 
ity test we mist know the results of previ- 
ous work on that particular test with simi- 
lar subjects under similar conditions over 
the period of time for which prediction is 
desired. However little faith one may have 
in the constancy of the I.Q., he must be 
still more cautious as to the constancy of 
personality measures, 

(b) Second, when and if quotients are 
developed for emotional maturity, general 
personality development, moral growth, and 
the like, they will probably be comparable 
to educational quotients (E.Q.) rather 
than to intelligence quotients (1.Q.). The 
E.Q. is markedly affected by training; the 
I.Q. only to a moderate degree. 

(c} Third, labels for an individual's 
personality make-up are even more dancerous 
than they are for his intelligence. If it 
is unsafe to label a child as of high-aver- 
age intelligence, or dull-normal, it is 
still more unsafe to label him as "highly 
suggestible", or "slightly dishonest.” Still 
worse, of course, is the common practice of 
classifying people in terms of the phrases 
used to identify one end or the other of a 
linear scale. Obviously some false assump- 
tions underlie glib characterizations of 
people as “introverts" or "extroverts", 
"conservatives" or "radicals", and the like. 


Social Factors Involved in Personality Meas- 
urement 

The whole social setting of personality 
measurement is so important that a final 








Gordon Hendrickson 











| 





247 


section of this paper must be devoted to it. 
Most accounts of personality assume that it 
is in large part a resultant of the reac- 
tions of the developing organism to the 
world of persons. Although, logically, 
traits can be analyzed out that seem to be 
non-social (accuracy for example), in actu- 
al life even these are usually related to 
social stimuli. Native power, to be accu- 
rate, hardly belongs in the area of this 
discussion, but accuracy as a personality 
trait involves a socially conditioned de- 
sire to be accurate. 

Not only are personality and character 
terms with little meaning apart from social 
relationships, but a program of personality 
measurement must face serious dangers in 
the relationship between the persons meas- 
ured and the persons doing the measuring. 


Recognition of these various social factors 
leads to consideration of several important 
problems. 

(a) Innate factors in personality are 


extremely unlikely to be isolated from 

quired factors. This is a familiar fact 
current psychology. The tendenc} 

up the attempt to separate nature :; 
ture. Personality tests that 
to isolate aspects of human nature no 
fected by traininc, 


socie] 


J a © 


ac- 
in 
to give 


or 
re 


and particularly by 
training, are likely to be so re- 
stricted in area as to be almost worthless. 


Early attempts in this direction, in fact, 
met with little success and have now large- 
ly been abandoned. 

(b) The social settine of the measure 
ment must be regarded as part of the situa- 


tion to which the subitect is responding. 
Specifically, who administers the test, 
else is taking it, the purpose for which it 


aS a aa at 
is administered, and the reiationship 


wno 


ne— 


tween examiner end examinee affect signifi- 
cantly personality and character test re- 
sponses. A few illustrations may make this 
point clear. In the Character Education 
Inquiry scores on moral judgment tests 


varied according to whether the tests were 
taken at home, at school, or in Sunday 
school. Personality and character measures 
have repeatedly drawn responses which the 
subjects thought would be pleasing to the 
examiner. It is easy to demonstrate a 
marked change in score in a certain 











a eee 








a 


2A8 JOURNAL OF EXPERIMENTAL EDUCATION 


direction for a group of subjects 
answering a personality inventory under two 
conditions: first, that of interest in 
finding out about their own make-up, and 
second, that of trying to make as good an 
impression as possible in applying for a 
particular type of work. (The writer has 
carried out an unpublished experiment of 
this type with the Bernreuter Personality 
Inventory. ) 

Indeed, the relationship between the 
subject of the measurement and the examiner 
is one of the most neglected and most impor- 
tant factors in the development of personal- 
ity and character tests. Five possibilities 
may be briefly suggested, these arranged 
somewhat in the form of a scale. Curiously, 
both ends of this scale are preferable to 
the middle, 

At one extreme, there is no necessary 
relationship between subject and examiner; 
the examiner does nothing to set up a test 
situation, simply observes the subject in 
his ordinary life, and records his observa- 
tions. This is essentially a rating situa- 
tion. It must be pointed out that ratings 
have been tremendously improved within re- 
cent years, that the old prejudice against 
rating is not justified when refined tech- 
niques are used, and that rating is one of 
the most valuable of all measurement tools. 
(Compare the Character Education Inquiry 
rating results.) . 

In the second instance, the examiner 
sets up a situation which simulates a natu- 
ral situation, that is, one which might 
easily occur in the subject's life, and ob- 
serves its results. The subject does not 
know that he is being tested. Many of the 
Character Education Inquiry tests follow 
Voelker's earlier tests in using this pro- 
cedure. Sometimes it yields a valuable quan- 
titative score. There is danger under these 
conditions that the subject may learn of the 
tests and may come to feel that an unfair 
advantage has been taken of him. 

The third condition is one in which the 
examiner sets up a clearly artificial situa- 
tion, ostensibly for one purpose, actually 
for another. A number of tests labelled as 
measures of skill or knowledge are really 
designed to test personality traits. In one 





of the Downey Will-Temperament tests, the ex- 





Volume II, No. 3 


aminer actually tells his subjects time does 
not count, when time is the only thing that 
does count. The student reaction to such 
tests is likely to be thoroughly antagonis- 
tic, to undo any good that might come from 
them, and to jeopardize all future tests 
whether good or bad. 

Fourth, and somewhat less objection- 
able, is the setting up of a situation which 
is clearly artificial and is recognized by 
the subject as such, but in which the ex- 
aminer has the subject's confidence to such 
an extent that he is willing to follow di- 
rections without knowing the exact purpose 
of them, The success of such a procedure 
depends upon the confidential relationship 
between subject and examiner, and upon the 
subject having faith in the examiner's good 
intentions and sincerity. This is a rather 
unnatural, strained situation. Certain 
questionnaires on attitudes and personality 
inventories fall into this category. 

Finally, a thoroughly wholesome condi- 
tion is one which involves setting up a 
test or questionnaire situation recognized 
as artificial, with the subject informed as 
to the real object of the test, his coopera- 
tion elicited, and results finally to be 
reported back to him just as they might be 
on an achievement or an intelligence test. 
The Strong Interest Analysis Blank and the 
Thurstone Personality Schedule are in this 
class. Vocaticnal guidance tests and tests 
to measure personality development where 
students desire such should largely fall 
here, 

Contrast the difficulties suggested in 
this survey with the comparatively simple 
situation of the intelligence test. An in- 
telligence test is a game of wits, and is so 
recognized. In fact, it is wits that one 
is concerned to measure in this case. On 
the other hand, a personality test fails be- 
fore it begins if it is such a contest. In 
personality measurement one is concerned to 
get as true a picture as possible of the 
make-up of the individual at the time of 
the testing. This means that every possi- 
ble precaution must be taken to avoid dis- 
tortion. The keenness of psychological in- 
sight which examiners have will largely de- 
termine their effectiveness, They must be 
able to see the testing situation as the 


ao a2 & 6m 


>, -e €C,. 7 .6UmremOS,UCMUC PS SlCUCO]UCCOS 


+> 


mh 


~~ nm ee ath lOoeklUCU ElCUCU CCD 


~ 


a -_ Ue. hCOUR 








he 


her 


in 


in- 
8 so 





March, 1934 


subject of the measurement sees it. They 
must see his objectives in the situation as 
well as theirs. 

(c) Finally, it should be clear that 
the standards of comparison for personality 
test scores must depend upon particular so- 
cial groups. These standards may be derived 
from the best judgment of supposed experts 
as in the case of moral knowledge tests, or 
from the average performance of certain 
groups. In either case, the norms are sub- 
ject to change when the standardizing group 
is changed. Personality measurement is a 
function of the mores of particular socie- 
ties. One cannot separate sharply a per- 
sonality test from its social setting. 

In conclusion, this consideration of 
some of the assumptions and problems in- 
volved in personality measurement may well 
make those of us interested in this task 
modest in claims. Caution is a prime char- 
acteristic of scientific attitude. Yet 
there is no need to denounce the whole move- 
ment or refuse to participate in it. Per- 


sonality measurement need not be trivial, or 


naive, or esoteric. In our changing civi- 
lization one value has not been shaken, but 
has even increased, This is the value of 
human life. To the study, complex and ardu- 
ous as it may be, of the most fundamental 
aspects of human nature, psychologists and 
educators may safely dedicate their energies, 
confident that the task is worthy and the 
rewards are great. 


REFERENCES 


1. Allport, G. W. "Concepts of Trait and 
Personality,” Psychological Bulletin, 
XXIV (1927), pp. 284-293. 

2. Allport, G. W., and Vernon, P. E. "The 
Field of Personality," Psychological 
Bulletin, XXVII (1930), pp. 677-730. 





cordon Hendrickson 


5. Fryer, Douglas The Measuremer - 
terests (New York: Henry Holt a 
Company, 1931), 488 pp. 

4. Kelley, T. L. Crossroads in the Mind 
of Man (Stanford, ifornia tan- 
ford University Press, 1928), 2: pp. 

5. Kelley, T. L. “Oddities in Mental 
Make-up,” School and Society (19% : 
pp. 529-544. 

6. Maller, J. B. "Studies in Character 


and Personality in German Psycho- 


logical Literature,” Psychological 
Bulletin, XXX (1933), pp. 209-222. 

7. Murchison, Carl, and others. The 
Foundations of Experimental Psychol- 
ogy, Genetic Psychology Monographs 
(Worcester, Massachusetts: » lark 
University Press, 1929), 907 pp. 

8. Threlkeld,A. L. and others. "Character 
Education," Tenth Yearbook of the De- 
partment of Superintendence (Wash- 
ington, D.C.: National Educatior 
sociation, 1932), 536 pp. 

9. Symonds, P. M. Diagnosing Personality 


and Conduct (New York: 
pany, ), 602 pp. 
10. Thurstone, L. L., and 


The Measurement of 


-entury 

1931 
Chave, E. J. 

Attitudes (Chi- 


cago: University of Chicago Press, 
1929), 96 pp. 
11. Watson, Goodwin "Character and Person- 


ality Tests,” Psychological Bulletir 
XXX (1933), pp. 467-487. 

12. Watson, Goodwin "Tests of Personalit; 
and Character,” Review of Educational 
Research, II (June, 1932), 88 pp. 

13. Young, Kimball "Topical Summarie: 
of Current Literature: Personal- 
ity Studies," American Journal of 
Sociology, XXXII (1927) 
971. 


» PP. 953- 














250 





Volume II, No. 3 


THE ECONOMICAL COLLECTION OF DATA FOR TEST VALIDATION 
by 
Paul Horst 
The Proctor & Gamble Company 
Cincinnati, Ohio 


Recently a test was developed for the 
selection of salesmen in a large industrial 
organization. The general method for de- 
veloping the test consisted of obtaining 
measures of efficiency on the salesmen al- 
ready employed by the organization, and then 
finding test material which would yield a 
high correlation with the criterion measures 
of efficiency. The validation procedure 
adopted was that of testing each individual 
item to find which discriminated between the 
better and the poorer men. This meant, of 
course, that the experimental test material 
had to be administered to the salesmen. 

Three general probiems presented them- 
selves. These were: 

1. To try out as many items as possible. 

2. To require as little testing time 
for each man as possible. 

3. To establish norms from the results 
on the experimental test material. This was 
necessary because the cost of readminister- 
ing the selected test material was prohibi- 
tive. 

A total of about 600 men was available 
on which to try out the test items. In or- 
der to double the amount of test material to 
be tried out, the men were divided into two 
comparable groups. The procedure for in- 
suring comparability will not be discussed 
here as it does not bear on the technique to 
be subsequently described. Separate sets 
of test material were administered to each 
group. The two sets of test material were 
roughly comparable. 

But each set of material consisted of 
two general types of test items, namely, 
personality and mental alertness items, 
that essentially two tests were developed 
for each criterion group, a mental alertness 
and a personality test. 

The selected material from both crite- 
rion groups was then combined into a single 


so 





composite test including two parts; the first 
was the mental alertness part composed of 
the mental alertness sections from both 
criterion groups; the second was the per- 
sonality part composed of the personality 
sections from both groups. 

The composite test was to be used for 
the selection of new salesmen. Hence, criti 
cal scores had to be established. But this 
implied a distribution of scores of all the 
salesmen on the total test. Due to the way 
the experimental material was originally ad- 
ministered, only partial scores were avail- 
able for each man. The problem, then, was 
to estimate a man's total mental alertness 
and total personality scores, having given 
his partial scores on these two tests. 

In general, the technique developed to 
solve this problem may be applied to any 
case in which sets of scores on two tests, 
each of specified length, are available when 
it is desired to estimate individual scores 
on lengthened forms of the tests. The tech- 
nique is based on the principle that when 
two correlated tests are lengthened, the cor- 
relation between them will be increased. 

Suppose we designate the original tests 
by t, and ty and the lengthened forms by t, 
and t,, respectively. We let 








Yr, = reliability of test 1 
Terr = reliability of test 2 
T,), = correlation between tests 1 and 2 
a = number of times test 1 is in- 
creased in length 
b = number of times test 2 is in- 
creased in length 
rap = correlation between the lengthened 
forms of tests 1 and 2 
ees abdrie 
Tap = (1) 
ya + (a® - a)r,, yb + (bd? - d)r, 





1) 


yarch, 1934 P. Horst 251 


assuming that: where Z, and Z» are the standard scores on 
1. The average of all the product mo- the original tests and Z, and Z, are the 
nents involving an added item in test (a) | standard scores on the lengthened tests. 
and an added item in test (b) is equal to Since the predicted scores are expressed 
the average of all the product moments in- | in terms of standard units, the conditions 
lving an item in test 1 and an item in | to be satisfied by (4) are that 
2. The average standard deviationof the | Go, =12060, (5) 


(C121 + CoZo) 
(C12) 2Z9) 
in test 1 is equal to the average 


ard deviation of the added items in test 


o 
ul 

~~ 
" 
a 





t G24, + &\ 42) 
3. The average standard deviationof the| ;,, = - ted 
items in test 2 is equal to the average ja + (a - a)r,, yb + (b* - b)P ory 
tandard deviation of tne added items in 
test (db), = P(¢,2) + CoZo)(CoZ1 + C20) (7) 
4. The average of all the product mo- 
ts involving one or two added items in Evaluating the right-hand side of (5) 
est (a) is equal to the average of all the| we have 
ct moments involving the items in test . 
1=C, + Co + 2C,Cor, 
5. The average of all the product mo- 
involving one or two added items in or substituting from (3) 
3¢ (b) is equal to the average of all the : : 
{uct moments involving the items in test _ ee a r 
lz 
The added test material is comparable ‘ ~ 
the original material if these assump- which, after substituting from (2), becomes 
is are satisfied. an identity. Similarly (6) is proved to be 
Formula (1), with a slight change in an identity. 
tation, is given by xellev.- From it we To prove (7) we evaluate the right-hand 
in estimate the correlation betweer side. We have 
sngthened form of the two tests, assuming, 
of course, that the items in the additions Tab = © + ° IVi2 
the tests are comparable, respectively, 
to the items in the original tests. or substituting from 
First, let us consider only standard ¢ 
6s on both the original and the length- — ¢ et 
led forms of the tests. We desire to pre- Tab ~ 2 . "hs 
ct new scores, Z, and Zp, as linear func- 
t of the original scores. wnich, after substituting from (2), also be- 
Suppose we define comes an identity. 
: 1+ re _ fiery a, Thus we Bee that the ai sorsangaen imposed 
2 = l+r., ao = iw te «) | by equations (5), (6), and (7) are satisfied 
' . | by equations (4). In actually predicting 
a: ; a, + ay a) =-ae | new scores from the original tests, however, 
-) = Ce = (3) | the original scores will usually be raw 


scores. Equations (4) predict the new scores 
in terms of standard scores on the original 
tests. To predict the new standard scores in 
terms of the original raw scores we employ 

Ze = C1Z1 + CoZe the conventional definition 

Zy = CoZy + CyZe (4) 72k 7 ll 

a 

. T. L. Kelley, Statistical Method, (New York: Macmillan Company, 1923), p. 205, formula 156. 


then the equations for predicting the scores 
on the lengthened tests are 




















252 


and rewrite equations (4) thus, 


ie =m 
Ze = Cy O71 in| 


- 
paee 


X, - My 


Zp = 
or rearranging 
C C 
Z, = —X, + 2%, . ( *M, + ste) 
9) fe) fe) fe) 
1 2 1 2 (8) 
C C 
2, = ex, + Oy, ~{ 2M, + 19) 
0 fe) fe) 
1 2 ts 2 / 


But for the purpose of establishing 
standards it is desirable to predict raw 


scores on the lengthened tests instead of 
standard scores. To do this we write 
Xe = Ma 
Za paceana Sa (9) 
but 
M, = aM) (10) 


since, by assumption, the items in the addi- 
tions to the tests are of the same average 
difficulty as the items in the original 
tests. 

It may be shown that 


1°) 
a 


= ~~? sae 
= o,\a + (a a)riy 


The right-hand side of this formla is 
simply the left denominator factor of formu- 
la (1) multiplied by the standard deviation 


of test 1. Its development is given by Kel- 
ley.? 
We let 
A= ja+ (a? - a)ry 
(11) 
B= \b+ (b* - 1)r,, 
so that (9) may be written 
Xe _ aM, 
Ze. *- (12) 
AO, 





JOURNAL OF EXPERIMENTAL EDUCATION 





Volume II, No, 3 


Similarly we may write 


(13) 


Substituting (12) and (13) in (8), 
finally 


we get 


~ 
Xa = Cy, AX) + CoA G Xe 


+ [ay 


- (CAM, + con® » Me) 


(14) 


Xp = C2Bg?X) + Ci BXe 





7 oO 
+ [bMe - (C2B-2My + CiBM,)| 
| 4 


Equations (14) can be written more com 
pactly 


Xe 2 Ey,X, + BypXp + Ki 
(15) 
X, = Ep,)X, + EpoXp + Kp 
if we let 
Oo, | 
1 
En = CA Eie = GAG 
Oe 
Eo1 = CoB Gg. Eoo = C1B , 
(16 
K, = aM - (EuM) + Ei2Me2) 
Kp = bM2 - (Eg, M, + EgeMp) 





remembering that 


a, + dp a, ~ ap 

1 *——— Cy, = ——— 3 
2 , 2 

in i+ Tab = 1l- Tab ae 

aq pecceassomsctapequsnmnnanisehtennes Zo = a (2 
1 + Pie 1- Tle 

. abr» 

ie 
AB 


Equations (8) and (15) are the ones 
which will ordinarily be used in predicting 
scores on the lengthened tests. Equations (4) 








l. Op. Cit. » Pp. 215, formula 164. 





warch, 1934 P, 


Horst 253 
will rarely be used with actual data, as From (3) 
test scores are rarely given in standard _ 1.01 + .93 _ 1.01 - .93 
deviation units. C, = 2 Cy = 3 
From the derivations it is evident that 
none of these three sets of equations make | Equations (16) give 
any assumptions whatsoever concerning the 
shapes of the original distributions of scores, E,, = -97 xX 1.46 = 1.42 
or for that matter, the shapes of the dis- 9.4 
tributions of the predicted scores. Ei2 = .04 x 1.46 x 7.3 = .08 
To illustrate the method numerically, 7.3 
the following data are available: Eo, = .04x1.65x 9.4 = .05 
Tir = 283 0; = 9.4 Evo = 97 x 1.65 = 1.60 
Po, = -87 Op = 7.3 K,; = 1.5 x 42.2 - (1.42 x 42.2 + .08 x 35.7) 
= .52 
a a K, = 1.7 x 35.7 - (.05 x 42.2 + 1.60 x 35.7) 
a= 1.5 Mp = 35.7 = 1 
Finally, we get for equations (15) 
D = 1,7 


From equation (11) we get 








A= 1.5 + (1.5? - 1.5)(.83) = 1.46 
B= \1.7 + (1.72 - 1.7)(.87) = 1.65 
From (1) we have 
1.5 x 1.7 x .72 
Yap = ————_-————=. -76 
= 1.46 x 1.65 
From (2) 
1+ .76 
a, a ES —= 1.01 
1+ .72 
1 - 76 
a, =1/——— = .93 
1 = ote 





X, = 1.42X, + .O08X%2 + .52 
X, = OSX, a 1.60X» + 1.46 
If it is desired to estimate all of the 


individual scores for a large number of cases, 
it is profitable to make up a facilitating 
table for each equation of the pair. The 
values of X, will be given across the top of 
the table. The values of X, will be given 
down the side. The body of the table will 
give the values of X, or X,, as the case 


may be, for any pair of values of X, and 
Xo. 























Volume II, No. 3 


ITEM ANALYSIS BY THE METHOD OF SUCCESSIVE RESIDUALS 


by 


Paul Horst 
The Proctor & Gamble Company 
Cincinnati, Ohio 


The problem of constructing a test 
which measures accurately what it purports 
to measure is one of the most difficult and, 
at the same time, one of the most important 
problems encountered in objective test work. 
Technically, this is the problem of con- 
structing statistically valid tests. 

Methods of individual item analysis 
have been employed for some time in which 
each individual item in the test is tested 
to see that it differentiates sufficiently 
between high and low criterion measures. This 
principle of item analysis has been employed 
with considerable success in industrial per- 
sonnel work. 

In general, however, the methods of in- 
dividual item analysis do not take account of 
the correlations between items. For this 
reason, one item may duplicate the function 
of one or more other items. It is desirable 
to have a method which will enable one tose- 
lect that combination of items which exhibit 
the optimal combination of mutual independ- 
ence and criterion correlation. 

In a recent article* the writer has dis- 
cussed in non-mathematical language a typeof 
item analysis which takes account of the 
intercorrelations of the items. It is the 
purpose of this article to develop the mathe- 
matical and statistical theory underlying the 
method and to illustrate the work sheet set- 
up for carrying out the mechanical opera- 
tions. 


I. THE GENERAL LEAST SQUARE SOLUTION 


Suppose we have a set of criterion meas- 
ures, C,, Co, -.. C,, and a set of test items, 
X,, Xp, +++ X,, Where each x represents an 
item and the score on an item is either zero 





or unity. Our most general problem is to se- 
lect from the series of items that combina- 
tion of items which will give the maximm 
correlation with the criterion when optimal 
weights are assigned to all the items. 

Let us first define the matrix of the 
items as follows: 


411 aie eee e808 ais 


aol Ae20 ee e808 Bon 


ani ane eee eee ane 








where a row represents an individual and a 
column represents an item. The elements a;; 
take only values of zero or unity, depending 
upon whether the individual marked the iten 
correctly or incorrectly. 

Let us consider the items as independ- 
ent variables and the criterion measures as 
the dependent variables. Thus we may repre- 
sent the set of equations defined by the 
criterion scores and the item scores by the 
augmented matrix. 


an G12 eee e808 Ay, fC) 
@o1 ae2e2 eee e868 Bon Co 


at=- (2) 


Qn1 ane eee e880 Ana on 








Now if m =n it is mathematically pos- 
sible to determine a weight for each item 
such that the criterion score would be ac- 
curately determined provided the matrix 
given by (2) satisfied certain mathematical 
conditions. We shall not consider these 
conditions in this article. 





1. R. S. Uhrbrock, and M. W. Richardson, "Item Analysis, the Basis for Constructing a Test for Forecasting Supervisory 
Ability", The Personnel Journal, XII (October, 1935), p. 141. 
2. Paul Horst, “Increasing the Selection Efficiency of Personnel Tests", The Personnel Journal, XII (February, 1954), 


pp. 254-259. 


nan an EE ak 


varch, 1934 


If m is greater than n, then it is pos- 
gible, by selecting any combination of m 
items, to determine weights for them which 
will exactly predict the criterion, provided 
the mathematical conditions just referred to 
are satisfied. 

But the correlation of unity between 
the criterion and the weighted test items 
which we get by this procedure is spuriously 
high. This correlation coefficient is ac- 
tually a coefficient of multiple correlation. 
Now a multiple correlation coefficient is a 
function of the number of cases involved as 
well as the number of variables. This func- 
tion is given by? 


(3) 


where 

R, is the coefficient adjusted for the 

number of variables 

R is the unadjusted coefficient 

n = number of cases 

m = number of variables 
From (3) it is evident that even though R is 
unity, R, is indeterminate when n = m. 

Evidently, then, it is not valid to 
determine optimal weights for the items with 
m =n. But suppose we form all possible 
combinations of n - k items where k takes 
all values from 1 ton- 1. Theoretically, 
for each combination we could calculate a 
set of weights and a multiple correlation 
coefficient by means of equation (3). The 
set, then, which yielded the highest cor- 
rected correlation coefficient would be the 
one from which to predict the criterion. 

However, two very obvious objections 
present themselves. 

In the first place, the number of com 
binations becomes enormous as n approaches 
any statistically useful magnitude. Thus 
when n is large enough to justify statisti- 
cal analysis of the data, even a very small 
proportion of the total number of combina- 
tions is still a very large figure. 

In the second place, for any given com 
bination of items, the weights must be de- 
termined by a solution of a set of equations 
equal in number to the number of items in 
the combination. If this number is greater 
than 8 or 10, the solution becomes very 


P. Horst 





255 


laborious, the amount of work required in- 
creasing approximately as the square of the 
number of items. 

We must, therefore, seek a more practi- 
cable method for selecting the combination 
of items which will give the highest corre- 
lation with the criterion. Whatever the 
method, however, we shall not obtain as high 
a correlation with the criterion as if we 
employed the one just outlined. 

The method which we shall develop will 
be called the Method of Successive Residuals. 
Two procedures will be developed. The first 
is more general than the second and is based 
on differential item weights. The second is 
based on unit item weights. 


II. DIFFERENTIAL ITEM WEIGHTS 


We shall begin by considering the Pear- 
son product-moment coefficient of correla- 
tion between the criterion c, and each item 
in turn. For the Kth item we have 
= Poo x, 


(4) 


cox 


This may be written in terms of raw 
scores 


ECoXk _ BCo EXy 
n n on 


a ux? /zx, ¥ 
CoV n \ n 
or multiplying both numerator and denominator 
by n* 


NEICoXy — (ZC,4)(2X,) 


ly = 
CoX*,K 


(6) 


NO, \mx? - (zx,)* 


Multiplying this last equation by no,, we 
have 


nzZC,X, — (ZC,)(Zx,) 


——— (7) 
Vinx? - (zx,)* 


Ne, Teoxy = 


Since an item score is either unity or 
zero, =X, is simply the number of individuals 
who marked item k in a specified manner. We 
designated this number by n,. Since 1® = Ae 
we have also xf = n,. Hence, we may re- 
write (4) 








1. B. B. Smith, "Forecasting the Acreage of Cotton", Journal of the American Statistical Association, XX, 1925. 














nzZc,X, — (Z2C,)ny 
\nny - (n,)* 

Our first step is to select the item 
which gives the highest value in (8), that 
is, the item which gives the highest corre- 
lation with the criterion. It is not neces- 
sary to divide through by no,, as this value 
is constant throughout the series. Let us 
arbitrarily designate the item which gives 
the highest correlation as item number 1. 
Since it gives the highest correlation item 
1 will predict the criterion score more ac- 
curately in the least square sense than will 
any of the other items. 

For each individual let us now consider 
an equation of the type 


NOe Te oxy - 


(8) 


Co - (AjX) + B,) = Cy (9) 
where A, and B, have been determined by the 
method of least squares. 

The formula for A, is simply an adapta- 
tion of the general least square regression 
formula:! 





uxy 
yx > gx? _ 
In our notation this equation would be 
nzc,X, — (Zc,)n 


mm, - nj 


Now the right-hand side of (9) repre- 
sents the part of C, which is uncorrelated 
with item 1. The sum of the squares of the 
residuals C,; may be written noc; . Then the 
correlation between the first item selected 
and the criterion is given by” 
Pe n*o,¢ 


(12) 
n*o,2 





where the subscript indicates that only item 
1 is involved in the correlation. 

Let us now rewrite equation (9) in the 
form 

Co — AyXi = By + Cy (13) 

Since B, is a constant, the right-hand side 
of (13) is uncorrelated with C,. In other 
words, it constitutes that part of the cri- 


JOURNAL OF EXPERIMENTAL EDUCATION 








Volume II, No. 3 


terion which camnot be predicted by item 1, 

We next proceed to determine which of 
the remaining (n - 1) items will best pre- 
dict this unpredicted part of the criterion, 
that is, which of the remaining items is 
most highly correlated with B, + C,. Since 
B, is a constant, we may neglect it and cal- 
culate the values 


nzCyX, - (£C,)n, 
Ahh i: eas (14) 


nn, - ng 


NOc, Te) Xk = 


for each of the remaining n- 1 items. The 
item which gives the highest value in (14) we 
designate as item number 2. This item pre- 
dicts more accurately the part of the cri- 
terion not predicted by item 1 than do any of 
the other items. 

Let us now consider for each individual 
an equation of the type 

Ci - (ApXe + Bg) = Cy (15) 

where again Ap and By have been determined by 


the method of least squares, the formula for 
Ap being 


NZC;X, — (z2C,)ne 


2 
mMe —- Ne 





Ae = (16) 


Now the right-hand side of (15) represents 
the part of C, which is uncorrelated with 
item 2. The sum of the squares of the resid- 
uals C, may be written no.$. Then the cor- 
relation between the criterion and the 
weighted sum of the first two items is 


2 
n*Oe5 


r°*sle- (17) 


2 
, n*o,2 
where the subscript indicates that both items 
1 and 2 are involved in the correlation. 

We may next write equation (15) thus 

Ci - AoXe = Bg + Ce (18) 

and carry through the same routine for se- 
lecting the third item. The routine may be 
followed until the last correlation coef- 
ficient is not materially larger than the 
previous one, 

Let us now consider the three basic _ 





1. F. C. Mills, Statistical Methods, New York, 1928, p. 386. 
2. Ibid., p. 375. 


warch, 1934 


equations involved in our technique. In 
eral furm they are 


gen- 


MICeXx ~ (FCe)Me 
v "a 2 (19) 
nn, - ny 


NZCeXe41 — (ZC,)MNo41 


NOc oe ,x, = 





Ag+ = P (20) 


MMe+i ~ Ney) 


2 - 


n2o, * 
Te+1 = ~ 


i -. Ce+) 


2. & 
n*o,° 


(21) 


These are the three equations involved 
in each item selection. The subscript ¢ re- 
fers to the serial order of the item selec- 
tion and not to the arbitrary serial number 
of the item. 

For the selection of the (f+ 1)th item 
there will be m-? calculations of the type 
represented by equation (19). The selection 
is based only on this series of calculations 
and does not involve equations (20) and (21). 
Once it is selected, however, these equa- 
tions are employed in sequence. 

Let us now see how equation (19) is 
built up through successive item selections. 
Consider first the denominator of the right- 
hand side of the equation. Evidently, the 
number of denominators is limited ton, the 
total number of cases. These values need be 
calculated only once for the entire series of 
selections. 

Now let us examine r,,,;, in the left- 
hand term of equation (19). We write the 
general form of equation (9) thus 

Ceo-1 — AeXe - Be = 


Ce (22) 


or transposing, 
Ce-1 — AeXe = Ce + Be 


Because of (22) we have 


Teexy = T (ce) - AeXe - Be) x, 


but, since By, is a constant, 


Te o(ce + Be) = 2 


Teexy ~ T (cea - AeXe)*x 





P. Horst 


Now by analogy with (19) 


noe. (c, a 


1 - AeXe) x 


nz(C._} - AeXe- )Xy waa {2(Co3 “n AeXe)} Ny 


= (29 


\ nn, - 


ny 
or 

NGO el (ce 1 — AexXe) x, 
— M2691 X_ — AgZXeXy) — (ZCg_1—- Ag™X_)Ny 


° 


\ | & 
V mM, — Ny 


But 2X,X;,; is simply the number of in- 
dividuals who marked both items e and k ina 
specified manner, and zx, is the number who 
marked item k in a specified manner. If we 
let 2x,X, = N,, we get from (24) and (25) 


NGe Te x, 


(mZC,_; X,_ — MAgNex) — (2C,_) — AeN,)Ny 


\ nn, - nf “ 


Equation (26) is most useful for cal- 
culating the numerator terms required in se- 
lecting the (e + 1)th item. The values 
nic,_,;X, have all been calculated in select- 
ing the eth item as has also the value tc,_,. 
To get the values nA,.n,;, the cell frequen- 
cies for the column involving the eth item 
are determined and these frequencies are 
multiplied by nA,. It is clear from (19) 
and (20) that 


NOce-1Tce-1xXe 


A. = (27) 
" \nn, ws ni 

Thus it is seen that the calculations 
are cumulative from one item selection to 
the next. It is important to take advantage 
of this fact since, at best, the number of 
calculations is large on any sizable prob- 
lem. 

We next consider the relationship be- 
tween n*o,% and n?0,° 16 By definition 


2 


n*o,. = nze, - (zc,)° (28) 















Let us rewrite equation (22) in the 
form 


Ce + Be = Ce_-1 — AeXe (29) 


Since B, is a constant, 


Oc, = (ce + Be) 
but 
O(cg+ Be) Oce -1- AeXe) 
hence 
Gee = Vce-1- Aexe) 
- 2 :. = 
n?Oc, = N° O(c._) - AeXe) 
and 


202 
n ad Ce - AeXe) 


= nz(Ce1- AcXe)” - [2 (Ce-1- AeXe)] * (30) 
Substituting (30) in (28) we have 
n?o2, = nz(ce_} - A.x.)*® - [z(ce,1- Aexe)|* 

or 
n’of,= [mzcg_1 + (-2nAekCe-1%e + MAsNe)] 
- (2C¢_1 — Aene)* (31) 


The value Zc,.,xX, in (31) is cumla- 
tively developed for the left-hand member of 
the numerator term in (26). It may be picked 
directly from the work sheet and multiplied 
by 2nA, to give 2nA,zC._1X,.. The value 
nin, is easily determined. These two terms 
give the amount by which nzc?2, is changed to 
give the brackets in (31) with the proper re- 
gard being given to signs. 

The term A.n, gives the amount by which 
ECe-1 is Changed to give the parenthesis in 
(31). Equation (31) may conveniently be 
written 


n*ot, = (ntct_, -a,) - (zc,_, - 6,)* (32) 


where 


Ge NA,(22C__1 Xe - A.n,) 


Be = AeNe 


Equation (32) may be regarded as the re- 
duction formla for the standard deviations 
of successive criterion residuals. 





JOURNAL OF EXPERIMENTAL EDUCATION 








Volume II, No. 3 
III. UNIT ITEM WEIGHTS 


The procedure which we have developed 
provides for determining a weight for each 
item. As a matter of fact, however, it is 
usually not desirable to have a weighted 
scoring stencil for a test composed of ob- 
jective test items. Unit weights may be 
preferable to differential weights. It is 
possible to modify the procedure outlined 
above in such a way as to select approxi- 
mately that combination of items which will 
give the highest correlation with the cri- 
terion when unit weights are employed. 

By employing unit weights we mean that 
the item, if marked in a specified mamer, 
is assigned a weight of either plus or minus 
unity. It is the function of the analysis, 
not only to select the item, but also to de- 
termine the sign of the weight. What the 
sign will be depends on the convention which 
was adopted in the preliminary scoring of 
the items. If the material is of the per- 
sonality type and “yes" or “plus” was checked, 
the item may be scored 1; if not, zero. The 
opposite convention might just as well have 
been adopted. On the other hand, when con- 
sidering the general intelligence or infor- 
mation type of item, the correct answer is 
usually definitely, or at least arbitrarily, 
specified. 

In the former instance, if in the analy- 
sis a “yes" or “plus” would come out nega- 
tive, then "no" or “minus” or some other 
form of negation may be regarded as the cor- 
rect answer. If in the latter type of item 
we were to obtain a negative weight, we 
should perhaps hesitate to include the item 
in the combination--that is, we should hesi- 
tate to give unit credit for an obviously in- 
correct answer, even though by so doing we 
increase the validity of the test. In any 
case, this inconsistency need not concern us 
in developing the procedure for items with 
unit weights. 

Let us begin with equations (8) 


NICoX, — (EBC_Q)ny 


NOc,leoxy = ; (8) 
V ny - nk 


and let us select as item number 1 that item 
which gives the largest absolute value in (8). 
Our procedure for determining r..,, is the 
same as the one developed earlier in the dis- 
cussion. This value we obtain (simply) 





bat tt CD 











yarch, 1934 P. Horst 259 





trom (8) by writing or 
nZC,X, - (2C,)n, 
Fos, *- ———— (8a) 
no, \nny - nj n(+20,x, + 2C,X_) - (2c,)(+ ny + Ne) 


ae + 2 
n(. D2 = Me) _(+ ny + ng) 
\* The + Np (34) 
Co — (A1X, + By) = Cy (9) 
To get rp, we have 


Next, we consider equation (9) 


where A; has the value given by (11). We 





find which of the remaining items gives the Te = Teo(+ x, + xe) 
highest correlation in absolute value with 
¢) by the formula n(+ ZCoX, + 2CoXp) — (ZC,)(+ ny + Ng) 
NZCyXy — (ZC, )nN,y e +m + he (35) 
the Rien 8S (14) NO. (2 tip + ne ) - (+ ny) + Np) 
‘\ nny - ny » 


From this point on, however, our pro- while the value of the adjusted coefficient 





cedure differs from that previously out- is obtained as previously described. 
lined. We observe whether the value in (14) To get the third item, we write 
is positive or negative and write Co - Ao(+ X, + Xp) = Cp + Be (36) 
Co = Ap(+ X; + Xp) + By (33) and get for all the remaining items the 
function 
where the sign of x, is the same as that of N&CeX, - (200)n 
8a) and the sign of x, is the same as that ie n emk euk 
of (14) when k = 2. “2” ©2*k a ale 
The value A in (33) is still determined ® . 
by the least square method, but it now be- ee 
comes 8 
nzc,(+ X,; + Xp) - (2c,) [2(+ x, + Xp)] M0ce Teexk 
ho a ae ae | ce ¥ a eer eee 72 
nz(+ ™ = Xo) é [2(+ m > X2)| ™ NGce Tic — Ag(* xy +..00 20s + xe)] X (37) 
Then 
nz[Co = Ae(+ Xi + eee e Xe) | Xx snd Nyz|Co _ Ae(+ Xi - ee + Xe)| 
Me Faw 8 ; 
my, ~ ny 
= _ W[BCoXy — Ae(+ May + eee + Nex) | — My [ZCo — Ag(+ ny + ... + n,)| 
ng, Pox, ™ 1 2 (38 ) 
my, ~ Ny 


Where the sign of any ny, or ny is the same as Ne; 3 Tey_3 24 determined from preceding cal- 
culations. 


The item which gives the largest value, either positive or negative, in equation (38) 
is the (e + 1)th item. 























260 JOURNAL OF EXPERIMENTAL EDUCATION 


Let us now consider the general expres- 
sion for the weight A, of the combined first 
e items selected. We write this expression 


nKe s (2 Co)Ne 


A, = — : 
MNie - (N,) 
N* Poo (+ xy t woe oe + xe) 
= - a (39) 
OE, w+ 60 0s + xe) 
where 
Ke = - ZCoX, + eee eee = ECoXe (40a) 
Ny = + Ny + coe coo + Ny (40b) 
Na, ma + Tas + Nip + see eee + Nhe 
+ Np) + Noe + see coe + Nge (40c ) 
z + Nee 


+ Me, * Neg 


Maj = Mji, Mia = Ny 


The signs involved in K, and N, are deter- 
mined as previously indicated. Those for 
any ny; involved in Ny are plus or minus, 
depending on whether items i and j have been 
given like or unlike signs. 

By means of equations (39) and (40) we 
may now write 


Ae+1 
n(Ke + 2CoXe+1) — (2Co)(Ne + Ne+1) 


n| Nye + 22 (+ Ni(e+1)) + Ne +1|- (Ne + Ne+3)* 


(41) 
or from (39) 
n?p 
hii oo Be 
PF i Fock ane + x,4,) 
The correlation r,,, is given by 
n Po, (& By % 000 000 + Xo+) 
Tet+1 = ——— oe (42) 





In other words, r,,, is obtained simply by 
dividing the numerator of (41) by the prod- 
uct of the square root of the denominator 
and no... 

Equations (38) and (39) give the basic 
formulas for selecting items by the unit 
weight method. Formula (39) may be employed 








Volume II, No. 3 
with the aid of equations (40), or its cum- 
lative form as given by (41) may be used. 


IV. MECHANICAL OPERATIONS 


To clarify the mechanical operations in- 
volved in the Unit Weight method, we shal) 
outline a convenient procedure consisting of 
three work sheets, A, B, and C. 

Formula (38) provides the basis of work 
sheets (A) and (B). 

On work sheet B column (0) gives the 
arbitrary serial number of the items. Colum 
(1) gives the frequency with which each iter 
was marked in a specified manner or n,. Col- 
umn (2) gives the sum of the criterion scores 
for those who marked each item in the speci- 
fied manner, or 2C,X,.- Hollerith tabulating 
machines simplify the computation of these 


figures. Column (3) is ——°— which is col- 


umn (2) multipled by a The division by 


100 is simply to reduce the size of the fig- 


ures in order to facilitate the calculations. 
Since column (3) is a function of the cri- 
terion scores and of the number of cases, 
division by a different power of 10 might 
prove more advantageous with another set of 
data. 

The remaining part of work sheet B is 
divided into sections of 3 columns each, an 
(a), a (db), and a (c) column. A complete 
section is required for the selection of each 
successive item. Column (a) gives 


a © Ble Oy cee cea t = 
k 


100 

which is the right-hand side of the numerator 
in (38) divided by 100. The division by 100 
is to make the terms comparable to those in 
colum (3). The expression in brackets is 
constant for any (a) column, and therefore 
the (a) column is simply this constant times 
colum (1). 





Colum (b) is 


nA, 
100 





(+ Nik + eee eee pd Nex), 


or the expression in parentheses in the left- 
hand side of the numerator of (38) milti- 
plied by nA,/100, where the division by 100 


QoQ 


oo 


st ef CO —~ 


—-~_ wae ee fet 








iS. 


ch 


{g to make the terms comparable to those in 
3) and (a). 
The constant multipliers, 


ECy — Ag(t Ny + coe ooo + Ng), 


for the (a) columns and the constant multi- 
pliers, MAe, for the (b) columns are ob- 
tained from work sheet C, columns (10) and 
ll) respectively, as will be subsequently 
explained. The quantities, 


(+n, + ooo eve + Ney), 


are obtained from work sheet A, as will also 
be shown. 

Column (c) is obtained by subtracting 
colums (a) and (b) from colum (3). This 
gives the numerator term of (38) divided by 
100. 

According to formula (38) it is neces- 
sary to divide this expression by jnn, - n? 
in order to determine which is the largest 
NOe Te, xy i.e., Which item should be se- 
lected next. In practice, however, when we 
work with a large number of items, it is con- 
venient to group the items according to the 
frequency with which they are marked in a 
specified manner and to analyze similar fre- 
quencies as a unit. If within each unit the 
frequencies do not vary tco widely, the de- 
nominator terms will not vary a great deal. 
If the highest number in a (c) column is 
markedly higher than the next highest fig- 
ure, the division by the denominator need 
not be carried out. In any case, it need,as 
a rule, be carried out for only the two or 
three highest differences in order to de- 
termine which item to select. 

Let us now consider work sheet A. The 
purpose of this sheet is to provide the val- 
ues of the type (+ ny + ooo oes + Mey) for 
the (b) columns in work sheet B. Each ny, 
gives the frequency with which those marking 
item i in a specified manner also marked 
item k in a specified manner. By using a 
Hollerith tabulating card for each item and 
punching in the field the cases for which 
the item was marked, the desired frequencies 
are obtained simply by considering the se- 
lected item with each of the remaining items, 
in turn, and counting the common holes. 





As previously indicated, a third work 





yarch, 1934 P. Horst 261 


sheet C is employed for the calculation of 
the constant multiplier 


=ECo —_ A.(+ ny = eee eee + n,) 
100 


for each (a) column on work sheet B, and the 
constant multiplier nA,/100 for each (b) 
column, r, is also computed from work sheet 
C. 

The calculations for work sheet C are 
based on formulas (39), (40a), (40d) and 
(40c). 

Column (0) gives the selection order of 
the items. 

Colum (0+) gives the arbitrary serial 
number of the items. 

Column (1) is K, as given by (40a). 

Column (2) is N, as given by (40b). 

Column (3) is N;, as given by (40c). 

Column (4) is nK, - (c,)N, as given by 
the numerator term of equation (39). In terms 
of columns (1) and (2) it is simply 
nx (1) - (z¢,) x (2). 

Column (5) is nN, - Né or the denomi- 
nator term in (39). In terms of columns (2) 
and (3) it is nx (3) = (2). 

Column (6) is the square root of col- 
umn (5). 

Column (7) is no,, x (6) or 


f 
2 
nO, \nN;. - No. 


Column (8) is column (4) divided by col- 

um (7), or TK, - (CoN. This colum 
NO., \nN,, - Nn? 

gives the value of r, as given by equa- 

tion (42). 

Column (9) is (4) + (5) and gives the 
value of A, in (39). 

Column (10) is (zc,) - (2) x (9) which 
gives the value of the constant multiplier 
for the (a) columns on work sheet B. 

Column (11) is n x (9) or nA,, which is 
the value of the constant multiplier for the 
(>) columns of work sheet B. 

The operations proceed in cycles through 
the three work sheets in general from A to B 
to C. 

It should be noted, however, that work 
sheet A is not required for the first item 











_— 


* 
bs 


: 
1M! 
t | 
: 
r 
i 
- 
(te 
Vi. 
Saar 
a 
nr 
1 


z; 
- 
— 


——— 


- 


oe EE ee 








Se ee Se eae Pata 


ys Mien: | x 


<< enna OES" 











262 JOURNAL OF EXPERIMENTAL EDUCATION 


selection, since the values given by it in- 
volve two or more items. Hence, work sheet 
B, section I, column (b), is filled with 
zeros. For a similar reason, zc, is the 
constant multiplier for the (a) column. 

Item (3) gives the largest absolute 
value in work sheet B, section I, column 
(c). Hence, this is the first item select- 
ed. 

Calculating line (1) of work sheet C, 
it is found that r, = .434, zc, - A,(-n,) 
= 1117, and nA, = 442. 

The values -n,, for column (1) of work 
sheet A are determined next. These values 
are the frequency with which those marking 
item (3) in a specified manner marked each 
of the remaining items, in a specified man- 
ner. The sign is negative because of the 
value -47 in work sheet B, section I, col- 
umn (c). 

The next step is to calculate Section 
II of work sheet B. Column (a) is column 
(1) multiplied by 1117/100. Colum (b) is 
442/100 times the values from column (1), 
work sheet A. From column (c) it can be 
seen that item (6) is the next item select- 
ed. The sign is positive. 

From line (2) of work sheet C, we find 
that r, = .535, zc, - A,(-n, + ny) = 928,, 
and nA, = 371. 

We next obtain the ve'ues np, for work 
sheet A, column (2). In colum (2b) of work 
sheet A, we add columns (1) and (2) giving 
(=n, + Noy). 

We are now ready to compute Section III 
of work sheet B. For this cycle, column (a) 
is 928/100 times column (1) and column (b) 
is 371/100 times column (2b) of work sheet A, 
We calculate the adjacent column (c) and se- 
lect item (2) from it. The sign is minus. 

The next step is to calculate line (3) 
of work sheet C. From this r, = .599, 

ECg - Asg(-n, + Np — ng) = 1056, and nAs= 331. 

Column (3) of work sheet A gives -ns,, 








Volume II, No, 2 
or the frequency with which those marking 
item (2) in a specified manner also marked 
each of the remaining items in the same man- 
ner. The sum of column (2b) and colum (2 
gives column (3b) which is (-n,, + ny, - n,,) 
Line (3) of work sheet C, and colum 
(3b) of work sheet A are used in computing 
Section IV of work sheet B. From column (c) 
item (8) is selected. 
In this manner succeeding items are se- 
lected. 
This process may be outlined by cycles 
in the following manner: 
Cycle 1 Work sheet B, Section I 
’ * ¢, Line (1) 


Cycle 2 Work sheet A, C»lumn (1) 
7 " _B, Section II 
os " C, Line (2) 


Cycle 3 Work sheet A, Columns (2) and (2b 
. " _B, Section III 
° " ¢, Line (3) 


Cycle 4 Work sheet 4, Columns (3) and (3 
% " _B, Section IV 
” " C, Line (4) 




















Etc. 
WORK SHEET A 
0 1 2 2b 3 3b 
3 é 6b 2 2b 
1 | -1l0 4 6 -5 -1l 
2\|-9 8 ry — i 
3 — $$ Ss —_—_}—— — —EEE 
4|-8 7 ok tb -10 
5 | -10 8 -8 -11 
6\-7 - . 
7 1-9 ” -2 -8 -10 
8 |-9 8 = -6 ~. 
9 | -13 -4 | -12 -16 
A Tih Be. Title 























QO 


5a- 





P. Horst 


WORK SHEET B 


263 














































































































_— ; II III | IV 
0 1 2 3 a | b c a b ce a | bl! ec a oe. S 
3 ’ 6 _| ‘ | 2 8 
4 = a + _ > —= 
1 | 17 | 303 | 133] 159 | o | -26 | 190 | -44 | -13 | 158 | -22 | - 3 | 180 | -36 rs! 
2 | 17 | 285 | 125] 159 | o | -34 | 190 | -40 | -25 | 158 | - 4 | -29 | 
3 18 276 | 121 | 168 | O -47 | ——i—---4 - : + ' 
4 | 18 | see | 142] 168] o | -26 | 201 | -35 | -24 | 167 | - 4 | -21 | 190 | ~33 | -2 
5 | 18 | 303 | 133 | 168 | o | -35 | 201 | -44 | -24 | 167 | -11 | -23 | 190 | -36 1 
6 | 19 | 491 | 216] 178 | oF | +38] 212 | -31 | #35 } + 
7 20 376 | 165 | 187 | f) -22 | 223 | -40 | -18 | 186 | - 7 | -14 | 211 | -33 | -13 
8 | 22 | 409 | 180 | 206 | Oo | -26 | 246 | -40 | -26 | 204 | - 4 | -20 | 232 | -23 | -29 
9 | 26 | 494 | 217 | 243 | 0 | -26 290 | -57 -16 241 | -15 | - 9 | 275 | -53 | - 5 
WORK SHEET C 
e netaptemenay 
0 o2 1 2 3 4 5 6 ” | s | 9 | 10 1 
=——[ = } = | + 
1 3 -276 | -18 | +18| 4704 | 468 | 21.63 | 10837 | .434 | 10.05 | 1117 | 442 
2 6 +215 | +1 | +23| 8524 | loll | 31.80 |15932/ .535 | 68.43 | 928 | 371 
3 2 -68 | -16 | +42 |11984 | 1592 | 39.90 | 19990 | .599 | 7.53 | lo5s6 | 331 
4 8 -477 | -38 +78 | 14580 1988 | 44.59 | 22396 | 652 | 7.33 | 1215 | 323 
n = 44 Zc, = 936 noe, = 501 








. = - 
Se as an ee ee 


* 

















on 


popnee PIE ig OR ea ers 


264 





Volume II, No, 2 


~ 


IMPROVED OVERLAPPING METHODS FOR DETERMINING 
VALIDITIES OF TEST ITEMS 


by 


John A. Long 
University of Toronto 


One of the more familiar of the tech- 
niques for determining validities of test 
items is the Vincent Overlapping method.* 
Ry this method the validity value for any 
item is the per cent of those failing the 
item who have higher criterion scores’ than 
the median criterion score of those who pass 
the item. Consider, for example, a test 
administered to 100 students, with 50 fail- 
ing a particular item. If, of these 30, 6 
have criterion scores higher than the median 
score of the 70 who get the item correct, 
the Vincent validity value for the item will 
be 6/30 of 105, or 20. The smaller this 
measure of overlapping the greater is the 
discriminating power of the item, and so the 
higher is its validity. Usually the crite- 
rion score is simpiy the total score on the 
test of which the item forms a part. 

The Vincent method makes rather inade- 


quate use of ‘he data to which it is applied, 


and while it enjoys the virtue of ease of 
computation it frequently leads to very 
serious inexactitudes. This will become ar - 
parent by a study of the very hypothetical 
set-up in Table I. Here we have a record of 
the passes (P) and "fails" (F) on 4 items by 
10 students with criterion scores ranked in 
descending order. Consider items 1 and 2. 
In each of these the median criterion score 
of those (P) who answer the item correctly 
is 75. In each, also, 2/5 or 40 per cent of 
the "fails" have criterion scores higher 
than 75, so their Vincent validity values 
are the same, 40 each. Yet the most super- 
ficial consideration makes it obvious that 
the two items are not equal in discriminat- 
ing power. In fact, while Item I favors, in 


general, students with high criterion scores, 


Item 2 does just the reverse. Actually, the 
validity of Item I is positive and fairly 





TABLE I 


FOUR HYPOTHETICAL ITEMS WITH VINCENT 
OVERLAPPING, BISERIAL r AND LONG 
OVERLAPPING VALIDITY VALUES 











ITEM 1 ITEM 2 ITEM 3 ITEM 4 
95 - P 95 ——_F 95 - P 95 - P 

90 - P 90 —_—-F 90 - P 90 - P 

85 F 85 - P 85 - P 85 - P 

60 ——_- F 80 - P 80 - P 80——_-F 
75 - P 75 - P 75 - FP 75— F 
70 - P 70——-F 70 ——-F 70 ——-F 
65 - P 65 —_—_-F 65 ——-F 65-—— F 
60 ———- F 60-———_-? so? oo? 
55 ——_—_F 55 - P 55——__-F 55 - P 

50 —F 50 - P 50 ——--F 50 - P 
vV.0. = 40 vV.0. =40 v.0.=0 v.0.=0 
Bis. r Bis. r Bis. r Bis. r 

= + .566 = -.305 = +1.088 = + .218 
L.O. L.O. L.O. L.O. 

= + .520 = -.280 =+1.000 = + .360 














high, while that of Item 2 is negative. We 
get a truer picture from their Biserial r 
values, which are +.566 and -.305, respec- 
tively. 

Notice likewise Items 3 and 4. Since 
no "fails" have criterion scores higher than 
the median for the "passes", the Vincent 
validity value for each is zero. Yet here 
again it is obvious that the items differ 
in discriminating power. The discriminat- 
ing power of Item 3 is perfect, while that 
of Item 4 is only slightly positive. Their 
Biserial r's are +1.088 and +.218, respec- 
tively. (The value greater than unity for 
Biserial r of Item 3 is explained by the 
fact that the Biserial r technique presup- 
poses a normal distribution of criterion 
scores, a condition which is not met in our 
data. ) 








1. Leona E. Vincent, A Study of Intelligence Test Elements, Contributions to Education, No. 152 (New York: Bureau of 
Publications, Teachers College, Columbia University, 1924), 36 pp. 





yvarch, 1934 John A. 

The writer has succeeded in devising an 
overlapping method of determining item- 
yalidities which makes more refined use of 
the data presented, and which is not subject 
to the grave inconsistencies possible in 
the Vincent method. For present purposes we 
shall call it the Long Overlapping (L.0.) 
method. The formula, 





( " " " ” 
lL.0. = 1 - 22 ("passes" below "fails" ) 
L (Np) (Np) 


can best be explained by giving an account 
of the manner in which it operates. With 
the test papers ranked in descending order 
of criterion scores let us consider an in- 
dividual item. To each wrong response, or 
failure (F) is assigned a value equal to the 
number of correct responses, or passes (P) 
which fall below it in the ranking. The sum 





TABLE II 


TWO HYPOTHETICAL EXAMPLES WITH ILLUSTRATIONS OF THE PROCEDURE 
FOR CALCULATING LONG OVERLAPPING VALIDITY VALUES 





Long 265 
of these values ("passes" below "fails") for 
the item under investigation is the measure 
of its overlapping. By dividing it by the 
product of the number of "passes" (Np) and 
the number of "fails" (Ny), the measure be- 
comes a ratio which varies from O for per- 
fect validity to 1 for perfect negative 
validity and yields values which are inde- 
pendent of the number of test-papers em 
ployed. If we carry the development of the 
formula one stage further by doubling this 
ratio and subtracting the result from 1, we 
obtain values which parallel the ordinary 
coefficient of correlation in their general 
Significance; they vary from +1 for perfect 
positive validity, through zero to -1 for 
perfect negative validity. The actual pro- 
cedure of applying the formula is illustrat- 
ed in the two hypothetical examples in Ta- 
ble II. 











EXAMPLE 1 rene ee 
92 -P 
87 - P 
86———-F (6) 
81 -P 
75 -P 

72 -P 
69———-F (3) 
68 - P 
62 F (2) 
60——-F (2) 
54 -P 

n 53 - P 
46———-F (0) 





= "passes" below "fails " 
=oO+2+2+ 3+6 13 





r 

Np = 8 

Np = 5 

22 "passes" below "fails" 
- L.O. = 1 - ncn 
(Np) (Ny) 
= 1 - 2 (13) 

-_ (8) (5) 
| =1- .65 =.35 


Biserial r = .37 











EXAMPLE 2 
63 - P 
oF (8) 
79 ——F (5) 
77 F (5) 
72 - P 
71——F (4) 
68———-F (4) 
67 - P 
64- -F (3) 
62——_-¥ (3) 
60— -F (3) 
56 - P 
55 - P 
51———-F (1) 
43 - P 
eS 2% "passes" below "fails" 
(Np) (Np) 
= 1 --2033)_ 
(6) (9) 
= 1-11.22 
= -.22 
Biserial r= -.30 


























266 JOURNAL OF EXPERIMENTAL EDUCATION 


The high degree of correspondence be- 
tween Biserial r and Long Overlapping valid- 
ity values may be inferred from the fact 
that for 110 items on a test in General Sci- 
ence the correlation between the two was .94, 
For the same data the correlation between 
Long Overlapping and Vincent Overlapping 
values was -.86, and between Biserial r and 
Vincent, -.82. The negative coefficients 
are explained by the fact that high valid- 
ity is indicated by high Biserial r and Long 
Overlapping values and by low Vincent val- 
ues. Not only do Biserial r and Long Over- 
lapping measures tend to correlate highly, 
but they tend to correspond closely in their 
absolute values, The average difference be- 
tween the two values for the 110 General 
Science items was .042, 

As compared with the Biserial r tech- 
nique the Long Overlapping method has a num 
ber of advantages: (1) It appears to be 
Slightly more effective in selecting items 
yielding scores which correlate highly with 
the criterion, and in discarding items yield- 
ing scores which correlate lowly with the 
criterion. (See Table 3). (2) The labor of 
computation is somewhat less arduous. In or- 
der to simplify the work the responses should 
be tabulated and an adding machine used in 
makine the summations. (3) It employs con- 
cepts all of which are readily grasped. And 
(4) it depends on no statistical assumptions 
such as normality of distribution of criteri- 
on scores and does not require the use of any 
statistical tables. The Long Overlapping 
method is at a disadvantage in that first it 
derives its values entirely from rank-order 
considerations and gives no exact recogni- 
tion to the absolute values of the criterion 
scores, and second it lacks Biserial r's 
background of honorable statistical theory. 
As compared with the Vincent method it is 
superior in everything except the matter of 
time required in computation. 

As far as the implications of the formu- 
las are concerned, the Long Overlapping meth- 
od resembles Biserial r in that the validity 
values are not influenced by the difficulty 
of the items per se. This is an advantage 
where one is building a test with the defi- 
nite plan in view of having its items scaled 
over a wide range from easy to difficult. 
Either of these methods so operates that an 








Volume II, No. 2 


item's chance of achieving a high validity 
value is not conditioned by its position in 
the difficulty scale, 

Such a characteristic, however, is a 
disadvantage where one desires to use the 
validity values to select a set of items on 
which the total scores made by a group of 
testees will yield the highest possible cor- 
relation with their criterion scores. Where 
this is the object we should employ validity 
techniques which favor items of 50 per cent 
difficulty in preference to items which are 
very easy or very hard. If 100 students 
take a test and only 1 fails to get Item A, 
then Item A makes only 99 differentiations, 
it indicates that as far as that particular 
aspect of knowledge is concerned a certain 
individual is inferior to each of 99 other 
individuals. If 50 students fail in Item RB, 
then Item B makes 2500 differentiations: it 
indicates that in the achievement it is at- 
tempting to measure, each of a certain 50 
individuals is inferior to each of a certain 
other 50 individuals. It surely follows 
that, other things being equal, Item B gives 
more information than does Item A and should 
accordingly have a higher validity value. 

This concept of making the validity 
value of an item partly a function of the 
number of differentiations it makes has 
been embodied in a further elaboration of 
the overlapping technique. Long Overlap- 
ping values are independent of item diffi- 
culties. If we multiply these values by the 
factor 


(Np) (Ny) 
‘Nr 2 
(#) 
where Nr is the total number of items, we 
obtain values which tend to correlate posi- 
tively with the number of differentiations 
which the items make. We shall call these 


values Long Weighted Overlapping validity 
values and the formula is 


L.W.O. = 
h _ 22 "passes" below a (Np) (Np) 


(Np) (Np) (zy 





This simplifies to L.W.O. = 
(Np)(Ng) - 2s "passes" below "fails" 





(*4) 


2) 


> a a ee! a ee. ae. a 


oo gaabtioewoaweesfe fede wwe Mm Oowyo mi 








- 
@ 


varch, 1934 


John A, 


The sole purpose of item-validation is 
the gaining of information which will permit 
one to so select items for a test that 
scores obtained on that test will have the 
highest possible correlation with criterion 
scores. Experiment indicates that the Long 
Weighted Overlapping method is superior to 
any of the other three in this particular. 
Five hundred vocabulary items were admin- 
istered to 144 university students and vari- 
ous item-validity techniques were applied to 
the results. The best 100 and the poorest 
100 items were selected by each method and 
the scores obtained on each 100 were corre- 
lated against the criterion scores, which 
in this case were the total scores obtained 
on the 500 items, The results are set out 
in Table III. Of course these correlations 
are spurious, since in each case the 100 
{tems constitute a fifth of the 500 against 
which they are correlated. All coefficients, 





TABLE III 


COEFFICIENTS OF CORRELATION BETWEEN 

TOTAL SCORES ON 500 VOCABULARY ITEMS 

AND SCORES ON 100 ITEMS SELECTED BY 
VARIOUS TECHNIQUES 








L.0. | L.W.0. 





| vincent | BISERTAL r | 


Best 100 -961 955 - 963 -970 
Poorest 100 -718 -603 573 -476 























however, are spurious to the same extent so 
this fact does not detract from their value 
for purposes of comparison, The methods are 
arranged from left to right in ascending or- 
der of effectiveness. While they differ but 
Slightly in their ability to select the best 
100 items, they vary widely in their ability 
to select the poorest 100. The practical im 
plications are obvious. If one is faced with 
the task of selecting a battery of the best 
items from a great number at his disposal, 
the choice of validity technique is not a 
matter of great importance. But where one 
has only a small surplus to play with and 
must decide on which to discard, validity 
methods vary widely in their effectiveness 
as of those considered here the Long Weighted 
Overlapping is superior. 

While the validity techniques outlined 


Long 267 
presentation, there is no intention to claim 
for them superiority over the dozen or more 
other methods not considered in this arti- 
cle. 


BIBLIOGRAPHY 


1. Abelson, Harold H. The Improvement of 
Intelligence Testing, Contributions 
to Education, No. 273. (New York: 
Bureau of Publications, Teachers Col- 
lege, Columbia University, 1927.) 

2. Barthelmess, Harriet May The Validity 
of Intelligence Test Elements, Con- 
tributions to Education, No. 505. 

(New York: Bureau of Publications, 
Teachers College, Columbia University, 


1931. ) 
3. Brigham, Carl C. A Study of Error. 
(New York: College Entrance Examina- 


tion Board, 1932.) 
4. Clark, E. L. "A Method of Evaluating 
the Units of a Test," Journal of 


Educational Psychology, XIX, (1928), 
pp. 263-265. 

5S. Cook, Walter W. The Measurement of 
General Spelling Ability Involving 
Controlled Comparisons Between Tech- 
niques, University of Iowa Studies in 
Education, Vol. VI, No 6. (lowa City, 
Iowa: University of Iowa, 1932.) 

6. Lentz, Theo. F. Jr., Hirshstein, Bertha 
and Finch, J. H. “Evaluation of Meth- 
ods of Evaluating Test Iteme," Journal 
of Educational Psychology, XXIII 
(1932), pp. 344-350. 

7. Lindquist, E. F., and Cook, Walter W. 
"Experimental Procedures in Test 
Evaluation,” Journal of Experimental 
Education, I (March, 1933), pp. 163- 
185. 

8. McCall, William A., and others. "Con- 
struction of the Multi-Mental Scale," 
Teachers College Record, XXVII (1926), 
pp. 394-415. 

9. Symonds, P. M. 























"Choice of Items for a 


of Educational Psychology, XX (1929), 


pp. 481-493, sss 








here have unique features which justify their 








of a Test and Its Diagnostic Value," 
Journal of Educational Psychology, 


XXIII (1932), pp. 335-343. 












Ah Rn 


SE es a Fiano 
— a 
—- 
= 3 


ee 
eee Ss. 





a 


ts EP be. a om 


EPEC ae OPE ooh FR eS 





268 JOURNAL OF EXPERIMENTAL EDUCATION 


10. Thurstone, Thelma G. "The Difficulty 





Volume II, No, 3 


ll. Votaw, David F. "Graphical Determina- 


tion of Probable Error in Validation 
of Test Items," Journal of Education- 
al Psychology, XXIV (1934), pp. 682- 
686. 








THE NEGATIVE SUGGESTION EFFECT OF THE FALSE 
STATEMENT IN THE TRUE-FALSE TEST 
by 
Howard Y. McClusky 
University of Michigan 


In comparing the relative merits of the 
true-false and the essay examination it ap- 
parently has not occurred to many writers 
that the true-false examination may exert a 
detrimental influence on the learner. The 
essay question depending on recall does not 
confront the examinee with conflicting mate- 
rial, while the true-false question depend- 
ing on recognition may conceivably confuse 
the mind of the student with false impres- 
sions. Since most examinations come at the 
end of a unit of subject matter, such false 
impressions may, if uncorrected, counteract 
some of the work of the preceding instruc- 
tion, In any case, no comprehensive ap- 
praisal of the true-false examination can be 
made until the extent of a possible detri- 
mental influence has been established. It 
is the purpose of this article to deal with 
this problem by reporting some experiments 
designed to measure the negative suggestion 
effect of the true-false test. 

Myers found clear evidence of the per- 
sistence of an initial error in arithmetic.? 
He writes "As far as habit process is con- 
cerned there is no difference between a 
wrong answer and a right one. Errors are 
not negative, they are just as positive as 
correct responses.” (Curtis and the writer® 
in experimenting with a modified form of 
the true-false test, discovered that abil- 
ity to recognize a false statement as pases 
did not necessarily imply an equal ability 
to amend the false statement so as to make 
it true. On the other hand, H.H, and E£,M. 
Remmers® found no evidence that the true- 
false test leaves a residue of false as- 
sociations. 


269 


TABLE I 


A COMPARISON OF THE ERRORS CONTRIBUTED BY 
THE TRUE AND FALSE STATEMENTS IN THE 
FINAL EXAMINATIONS OF FOUR DIFFERENT 























COURSES” 
; 'wn ! 
“A oe ” 
u4c o 
S33 a0 
“ 8 oo 
“a @ ” 
os Br ww 
° a O +4 
n a 
Class oa” 4 oa a 
230 a ae e 
sss - 80 » 
“a O ° 3+ ° 
, a> 8 t+ 2+» te 
———— |True| Falsel Equal ___|True|False 
Educational 
Psy chology 22 78 8 |108/| 146| 291 | 437 
Vocational 
Guidance 10 46 6 62) 253/| 412 | 665 
Hygiene 101; 40 61 26 1127) 508] 585 /|1093 
Zoology 1 1 37 2 | 40] 267| 508 | 775 
4. ' Fn. ah i 
Total 73 222 42 | 337|1174/1796 |2970 

















*Each set of examination questions contained an equal 
number of true and false statements. 


The preceding table is read as follows: 2? of the 
108 students in educational psychology made more 


errors on the true statements than on the false; 
78 made more errors on the false statements, while 
8 made an equal number of errors on both. In the 
same course, out of a total of 457 errors, 146 were 
made on true statements and 291 on false. 


PRELIMINARY INVESTIGATION 


The relevance of the problem of this 
investigation may be readily demonstrated 
by an analysis of the errors on true-false 
examinations. In this study such an analy- 
sis was employed to determine whether or 
not the false statement contributed more 
errors than the true statement in the final 





1. G. C. Myers, "Persistence of Errors in Arithmetic," Journal of Educational Research, X (June, 1924), pp. 19-28. 


2. H. Y. McClusky, and F. D. Curtis, "A Modified Form of the True-false Test," 


pp. 215-224. 





Journal of Educational Research, XIV, 





5. H. H, Remmers, and E. M. Remmers, "The Negative Suggestion Effect of True-False Examination Questions," Journal of 


Educational Psychology, XVII (January, 1926), pp. 52-56 














—_— 


HF 
1. 


ay ge IO 7 


~a 





270 JOURNAL OF EXPERIMENTAL EDUCATION 


examinations of four different courses at 
the University of Michigan. The results of 
this analysis are contained in Table I on 
the preceding page. 

The data of this table indicate that 
three times more students miss more false 
statements than true ones @22 vs, 73) and of 
the questions missed about one and a half 
times more are false than true (1796 vs. 
1174). This preliminary analysis indicates 
definitely that in the true-false examina- 
tion, the false statement presents much 
greater difficulties than the true state- 
ment. This striking fact makes the investi 
gation of the negative suggestion effect of 
the false statement in the true-false exam- 
ination particularly relevant. 


THE MAJOR INVESTIGATION 


Materials: The materials consisted of 
two forms of a true-false test and one form 
of a multiple-choice test. Each test con- 
tained 80 items, half of which in the true- 
false tests were true and half were false. 
In order that they should contain identical 
subject matter, the three tests were con-: 
structed in the following manner: The ml- 
tiple-choice test of 80 statements with 
four choices for each statement was con- 
structed first. Then the 80 statements were 
converted into true-false form by completing 
each multiple-choice item with only one of 
the original choices, The two reversed 
forms of the true-false tests were devised 
by making the 40 true statements of Form A 
false for Form B, and the 40 false state- 
ments of Form A true for Form B. The mode 
of construction may be readily illustrated 
by the ensuing example: 

Multiple choice: In a large unselected 
population, the I1.Q. theoretically possessed 
by the largest number of persons is: (a) .90, 
(b) 1.00, (c) 1.10, (d) 1.20. 

True-False Form A: Ina large unse- 
lected population the I.Q. theoretically 
possessed by the largest number of persons 
is 1.00.....(True False) 

True-False Form B: In a large unse- 
lected population ‘the I.Q. theoretically 
possessed by the largest number of persons 
is .90.....(True False) 

















Volume II, No, 3 


Method: The experimental procedure 
consisted in administering the true-false 
and multiple-choice tests to four sections 
of introductory educational psychology at 
the University of Michigan as the required 
examination over the first half of this 
course. Sections I and III took form A of 
the true-false test followed immediately by 
the multiple-choice test, while sections II 
and IV took form B of the true-false test 
followed immediately by the same multiple- 
choice test. 

The intent of the method was to provide 
experimental materials identical in subject 
matter, and variable only in form, on the 
theory that the differential response must 
be due to the form and not the content of 
the original statement. Furthermore, in or- 
der to compare the influence of the true 
and false element, there must be an equal 
number of both types of question, and each 
statement (or item of content) must be cast 
into both forms. And, finally, the multi- 
ple-choice test was given last in order to 
register the effect of the initial true- 
false questions. 


TABULATION OF RESULTS 


Since the essential idea of the inves- 
tigation consists of the influence of the 
true-false on the multiple-choice test, the 
scheme of tabulation was designed to show 
this relationship. Each statement of the 
true-false test was compared with the cor- 
responding statement in the multiple-choice 
test. There were four different, possible 
comparisons. A statement could be: (1) cor- 
rect in both tests (tabulated C C, i.e., 
correct in the true-false and correct in the 
multiple-choice test); (2) incorréct in both 
tests (tabulated I I, i.e., incorrect in the 
true-false and incorrect in the multiple- 
choice test); (3) correct in the true-false 
and incorrect in the multiple-choice test 
(tabulated C I); or (4) incorrect in the 
true-false and correct in the multiple- 
choice test (tabulated I C). Under each of 
the preceding four comparisons a tabulation 
was made as to whether the statement in the 
true-false test was, in its correct form, 
true or false. The tabulation of the re- 
sults is illustrated in Table II. 





cr he 


the 
oth 
the 


of 
on 
he 


varch, 1934 


TABLE II 


Howard Y. McClusky 271 


SAMPLE TABULATION OF TRUE-FALSE STATE- 
MENTS UNDER THE FOUR COMPARISONS IN THE 
CASE OF ONE SUBJECT 



































: and in the four comparisons are given in 
T-F (A) The four comparisons Table III. 

a c C¢ It cl Ic The data of Table III are based on 

Nene coe 22 =F 27 averages. While in this instance the dif- 
Shaw 24 12 5 il é -. | adel ; ferences between the true and false elements 


The data from the four sections were 
tabulated according to the method just de- 
scribed. The averages of the four sections 
in the true-false and multiple-choice tests, 


correct (t-f) correct (m-c) 
incorrect (t-f) incorrect (m-c) 
correct (t-f) incorrect (m-c) 
incorrect (t-f) correct (m-c) 


HOR DD 
Qnrn OD 


Mr. Shaw, the subject, was a member of one of 
the sections, I or III, which took Form A of the 
Out of 80 items, 36 (24 plus 12) 
were correct (C C) in both tests; of the 36 items, 
24 in the original true-false test were true and 12 
Out of 80 items, 16 (5 plus 11) were in- 
correct in both tests (I I); of the 16, 5 in the orig- 
inal true-false test were true and 11 were false. Out 
of 80 items, 12 (5 plus 9) were correct in the true- 
false test and incorrect in the multiple-choice test 
(C I); of the 12, 5 in the original true-false test 


true-false test. 


were false. 


were true and 9 were false, etc. 


THE AVERAGE SCORES IN THE TRUE-FALSE 
AND IN THE TRUE-FALSE AND MULTIPLE-CHOICE COMPARISONS FOR EACH 


are significantly wide (with the possible 
exception of comparison I C), an average 
taken alone, at best, presents only 4a par- 
tial statistical picture. The original 
data of the experiment have therefore been 
rearranged in a supplementary tabulation. 
This supplementary tabulation consists 
in counting for each section the number of 
cases in each comparison in which the high- 
er score was in favor of the true or false 
question. For example Table III indicates 
that for section I in the C C comparison 
the advantage for the true as against the 
false question is 25.8 to 19.3. For the 
Same section and the same comparison, the 





TABLE III 


AND MULTIPLE-CHOICE TESTS 





























SECTION 
a [nee o — 
Section : True-false muitiple-choice 
and | Miller = True- Zultiple! : coz parisons 
number of mental false cnoice | ¢ C ; I s 41-5 ] 
subjects test test | T 7 T $ 1 - 1 
Form A true-false test 

I (21) 91.5 54.8 | 56.1 | 25.8 | 19.3 | 5.3! 9.8] 3.1 6.1 | 4.2 
TIz (27) 91.0 52.2 50.2 | 24.3 | 15.5 7.0 111.23 4.6 : 3 1 4 
Average I & 

III (48) 91.2 53.3 52.8 | 4-8 17.9 | 6.3 | 10.6 9 .0 4.6 

—}—_____ = — . — - _ I 4 
Form B true-false test 
{ i | 

II (33) 85.0 50.4 | 49.8 26.3 | 12.2 | 4.0/13.9 | 4.1 | 6.9 | 4.9 | €.2 
IV (14) 90.6 50.1 50.0 25.8 1s.l Dex 13.3 4.5 7.8 4. 7.8 
Average II | 

& (1V (47) 86.3 50.3 49.8 26.1 | 12.5 | 4.3 | 13.7 | 4. 6 3 | 6.6 
Average H T 
four , 
sections (95) 89.0 51.8 51.3 96.5 | 15.1 5.3 [12.2 | 4.0 | 7-3 | S.1 5.6 








The above table is read as follows: 


etc. 


Section I composed of 21 students made sn average Miller Mental Test score 
of 91.5, an average true-false test score of 54.8 and an average multiple-choice test score of 56.1. Of 
ments correct in both tests (C C) 25.8 were true in the original t-f test, and 19.5 were false in the same test, 


the state— 











7 
ye 











ERE ARG EL ERS EEG OBR E LS PE 


=a 
= 
Ne 


pane 





272 JOURNAL OF EXPERIMENTAL EDUCATION 


same advantage for the true question was 

present in the results of 19 of the 21 mem 
bers of the group. In other words the sup- 
plementary tabulation was designed to indi- 
cate the number of subjects in each section 
and comparison showing the trends already 

revealed by the averages of Table III. The 
results of this re-arrangement of the orig- 





inal data are presented in Table IV, 





Volume II, No, 3 


hold for 86 of the 95 subjects, an advantage 
for the false question would hold for only 
seven subjects, and in the case of the two 
remaining subjects the results would be 
equal (Table IV). Or stated in more gen- 
eral terms: When a statement is correctly 
recognized as being true or false and when, 
out of four choices, it is correctly com 
pleted, such a statement, in its true-false 


TABLE IV 


THE NUMBER OF SUBJECTS IN EACH SECTION SHOWING AN ADVANTAGE FOR 
THE TRUE OR FALSE STATEMENT IN EACH OF THE FOUR COMPARISONS 








Section 




















True-false multiple-choice 
and aS et aohet comparisons 
number of Cc Cc ae Fi c I I Cc 
subjects T F 0 T F 0 T F 8) 2 F 1¢) 
Form A true-false test q 
I (21) 9 e2 0d S$ 18 1 Ss 4 i 15 5 1 
III (27) 23 2 2 2 25 0 3 20 4 8 16 3 
Total I & III 
(48) 42 4 2 7 40 1 8 35 5 23 21 4 
Form B true-false test 
IT (33) 32 1 0 c@) 32 1 1 28 3 11 18 4 
IV (14) 12 2 0 0 14 0 4 10 0 4 10 1@) 
Total II & 
Iv (47) 44 3 1@) 0 46 1 5 39 3 15 28 4 
Total four 
sections (95) so Ff 7 86 e2 13 74 8 38 86449 8 

















The above table is read as follows: In section I, which took form A of the t-f test, 
19 out of 21 subjects had marked more true than false statements in the C C comparison and 
only 2 had marked more false than true statements, etc. 


Table IV affords a striking confirma- 
tion of the results of Table III. Both sets 
of data indicate a marked disadvantage for 
the false statement in three of the four com 
parisons. The data of the fourth compari- 
son (I C) are not conclusive. Since the 
averages of the entire experiment are cor- 
roborated by the averages of each section, 
in discussing the major outcomes it will be 
sufficient to refer only to the total aver- 
ages, , 

If the true-false and multiple-choice 
papers of the entire group of 95 subjects 
are compared, of the questions that are cor- 
rect ir. both tests (C C) an average of 25.5 
of the true-false questions are true, and an 
average of 15.1] are false (Table IiI). This 
same advantage for the true question would 





form, is much more likely to be true than 
false. Tables III and IV also indicate 
that when a statement is incorrectly recog- 
nized as being true or false and when, out 
of four choices, it is incorrectly complet- 
ed, such a statement in its true-false form 
is much more likely to be false than true, 
Furthermore when a statement is correctly 
recognized as being true or false, and when 
it is incorrectly completed, it is more 
likely to be false than true. 

Finally, if in either or both tables 
(III and IV) the averages for the compari- 
sons C C and C I are added for one total, 
and the averages for the comparisons II and 
I C are added for another total, there is 
ample evidence to support the following 
statement: When a statement is first 


—- ne set uc ak ae el Ue CU CU mod MN tet O 


varch, 1934 Howard Y. 
correctly or incorrectly recognized as being 
true or false, it is much more difficult to 
select, out of four choices, the proper com- 
pletion for a false statement than to do the 
same thing for a true statement. 

The discussion may be briefly summa- 
rized by stating that the evidence is defi- 
nitely unfavorable for the false statement 
of the true-false test. No results of the 
intelligence test and none of the total 
scores on the true-false and multiple-choice 
tests account for this outcome. In addi- 
tion, no explanation can be found in the 
relative difficulty of the true-false and 
multiple-choice tests. Furthermore, as al- 


ready explained, the procedure allowed vari- 


ability only in the form and not in the con- 
tent of the question, thus ruling out dif- 
ferences in difficulty inherent in the sub- 
ject matter. The only factor remaining is 
the nature of the false statement itself, 
which apparently introduces a "confusional 
element" in the performance of the subject. 
This generalization must, of course, be 
limited to the present investigation. For 
example, this study has dealt only with the 
immediate effect of the false statement. 


Great caution must therefore be exercised in 
extending the conclusion to the problem of 
delayed performance, 

But if the general significance of the 
evidence is accepted, one important applica- 


tion is inevitable: The responsibility of 
the instructor and the student should not 
stop with the administration of the true- 
false test. Provision should be made for 
the correction of the test after it has been 
taken. In the case of a final examination, 
ideally, the true-false test should be em- 
ployed not in the last period of the course 
but a period or two before so that the last 
meeting may be devoted to a discussion of 
the test itself. The true-false test is un- 
doubtedly a valuable technique; but for in- 
structional purposes this investigation sug- 














McClusky 


gests that its value would be greatly en- 
hanced if some method is adopted to miti- 
gate the probable "confusional element” ap- 
parently inherent in the false statement of 
the true-false examination. 


SUMMARY 


Two sets of data are reported relating 
to the negative suggestion effect of the 
false statement in the true-false test. The 
preliminary data consisted of an analysis 
of the errors in the final examination 
four different courses at the University of 
Michigan. This analysis revealed that 
false statements contributed a much greater 
number of errors than true statements. The 
major series of data was secured by admin- 
istering a true-false test followed immedi- 
ately by a parallel multiple-choice test to 
four sections of a class in educational 
psychology at the University of Michigan. 
The procedure was designed to present ex- 
perimental materials identical in subject 
matter and variable in form in such a way 
that the differential response was due only 
to the form and not the content of the 
statement. In this manner the negative in- 
fluence of the false statement in the true- 
false test was isolated. The evidence 
Clearly indicated that the immediate effect 
of the false statement was negative and 
detrimental. The objective of this inves- 
tigation did not “nclude the probiem of the 
delayed effect of the true-false statement. 
If, however, the foregoing conclusion is 
accepted, its application is obvious: The 
negative effect of the true-false test 
should be counteracted. One proposed 
method is that the student should correct 
the errors on his examination, either on 
his own initiative or in collaboration 
with his instructor, 


of 











RE sg SS 


| 
ia 
i 
i 





< ey 


i mes ental ee er 
| ees - 
. AF ase : 
Parr 2 . e = aot 
os i . —— _ atenns = ane i 
ee on . 
See big rere Ly oe Litre 4 > = The 3 


aed 


et ee 






274 





Volume II, No, 3 


A GRAPHICAL METHOD FOR COMPUTING THE STANDARD 


ERROR OF BISERIAL r 


by 


Walter J. McNamara 


and 
Jack W. 


Dunlap 


Fordham University 
New York City 


Various bi-serial correlation coeffi- 
cients for determining the relationship be- 
tween two functions, one of which is’ con- 
tinuous and distributed in several classes, 
the other arbitrarily dichotomozed into two 
categories have been proposed by different 


writers. Pearson, 1909, gives the following 
formula,? 
M, - Mr p 
Tbi-serial ~ a. ae Z ( 1 ) 
where 


Mr = Mean of total distribution 
M, = Mean of the category with the 
less frequencies 
o = standard deviation of the total 
group 
p = proportion of cases in category 
M, .-p<q 
Z = the ordinate of the normal curve 
at the point of dichotomy. 


Kelley, 1924, offers a slightly different 
function claiming that it is simpler to use® 
His formula is 





(Mo - M,)pq 
Pbi-serial ~ . — (2) 
where 
r, 06, Z and p are the same as above 
q = (1 - p) 
M, and My are the means of the two 
dichotomies. 





A third formula has been proposed by 
Richardson and Stalmaker, 1933, using 
Slightly different assumptions as to the 
form of the distribution underlying the di- 
chotomized variable.® This formula is 


_ Me - My 
T wi-cxtal ~ 6 y pq (3) 


M, = the mean of one category 
M, = the mean of the other category 
p = the percentage of measures in 
the X, category 
q = (1 - p) 


A fourth formula has been developed by Ken- 
nedy, 1933, on the assumption that the two 
categories in the dichotomized function are 
strictly discontinuous as in the classifica- 
tion, male, female.* This formula is 


_M, - =*/4 
. bi-serial — o Dp (4) 
where 
M,, My, Tr, O, q and p are the same as 
above. 


In order to interpret a correlation 
coefficient it is desirable to have a meas- 
ure of the magnitude of its sampling error. 
Soper, 1914, gives the following formula 
for the standard error of Pearson's form 
ula> (1) 





l. Karl Pearson, "On a New Method of Determining Correlation between a Measured Character A and a Character B, of Which 
Only the Percentage of Cases Wherein B Exceeds or Falls Short of a Given Intensity is Recorded for Each Grade of A", 


Biometrika, VII (1909). 


2. T. L. Kelley, Statistical Methods, (New York: Macmillan Company, 1924), p. 248. 
5. M. W. Richardson and J. M. Stalmaker "A Note on the Bi-Serial r in Test Research", Journal of Genetic Psychology, 


VIII (1955), pp. 463-465. 
4. J. L. Kennedy, An unpublished paper. 





5. H. E. Soper, "On the Probable Error of the Bi-Serial Expression of the Correlation Coefficient", Biometrika, I 


(1914). 









W. J. McNamara and J. W. Dunlap 


" f . . 
2 1 | pq 3 px | qx y 
= — {| =5-/|—+ o ) | 2 4\ 
*bi-serial N Z | ‘ 7 \1 m Z, $F (5) 
\ bs 
where 








x = deviate of the normal curve and p <q. 
N, the total number of cases. 








This formula is lengthy and very laborious the values computed by the nomograph. 


to compute and as an approximation Soper In using the nomographs the method is 
proposes " | as follows: On nomograph No. 1 find ron 
(\/ Pa re the left-hand scale and p on the center 


\ 2 scale. Connect these two points by means 














Pek ie "YN : (6) of a straight edge and read the result 
f(p, r) where the line cuts the third scale 
The difference between the values obtained at the right. Then enter Nomograph No, 2 
by the use of these two formulas is shown in| with this value of f(p, r) on the center 
Table I for certain values of p andr. scale and find the appropriat2 N on the 










TABLE I 








THE DIFFERENCE BETWEEN VALUES FOR o BISERIAL r COMPUTED 
BY SOPER'S EXACT FORMULA AND BY HIS APPROXIMATION FORM- 
ULA FOR VARIOUS VALUES OF r AND p 


—— — T 7 - T T T 


: 500 | 209 | 159 .067 023 =| 006 


r : a } é a (b) 
: 1.92 


1.86 

























~— 




























bg In each section cclum (a) is the value | TABLE I] 
A for the exact formula (5) and in column (b) | ite Wein ia: Dita oniiiiiiiaae ata 
— _— — Che approximation formula (6). | MACHINE COMPARED WITH THE VALUES FOR 
| It should be noted that the error only reacir| FORMULA 6 DETERMINED EY THE NOMOGRAPHS 
es 5% when r = 50, p = .006, or r = .75, | —_—— 
= .067; or r = 1.00, p = .159. Also for | \é) \b) 
is all values of r below .25 and D greater than | ae BS: . | nN | 9 | Or | Error 
.Ol the maximum error is 2%. | .CO -50 1000 .040 .040 -000 
These formulas of Soper (5) and (6) are! -18 42 450 O56 C56 -000 
trrictly applicable only to Pearson's “bi- oe = — ora or — 
i serial r" (1). However, the short formula | *.. — Bh os ype ee 
3 5 ape 4 ry SEs Urn ; ,82 32 872 -031 -034 -002 
= (6) mav be used in most cases to obtain an | .51 27 671 042 046 004 
approximate sampling error of the coeffi- | .75 of 520 032 .034 .001 
cients proposed by Kelley, Richardson and | +91 £1 57 - 096 - 096 -000 
‘talnaker and Kennedy. 23 | .19 | 912 | .019 | .016 | _ «O01 
Even when: the short formula (6) is used 
“" the computation entails a creat deal of left-hand scale. Connect these two points 
labor. This difficulty may be overcome by as above. Where the straight edge cuts the 
means of the accompanying nomographs. The right-hand scale the result o, is read. 
comparative accuracy of the determinations In a similar manner it is possible to 
is shown in Table II. solve for N, p, or r, when Oy and any two 
Colum (a) of Table II gives the value of the other functions are known. A fine 
by formula (5) of the standard error of bi- line scratched on transparent celluloid is 
serial r and column (b) of the table gives a great aid in accurately reading the values. 














Volume II, No, 2 


JOURNAL OF EXPERIMENTAL EDUCATION 


<e) 
~ 
nN 


f(pr) 


3.80- 














TTT n TIT PTT TTT TTT TTT TTT TTT rey Hyrnpn Whi ie sk a le 
| | | | | | | | ; | | | 
78 8 § §€ €@ § § § §€ $$ §€ § FF FF RK 

Pr" ptt yn as ia i 
“5 S : a2 *. 2s 
= , F&F fF F & & F Ft 8 ee 8 ee 
| wie 1 | { 
L Lil labelidliit pos baraa bis hia A Tee Lit eS 














—_ 


> 


Se 


- 





SPS aa ae ietiendeatle 


Pe eR Om ee ene SE En mene ere oe 


a 


i 


at eg 








Warch, 1934 W. J. McNamara and J. W. Dunlap 


‘ 
1000 


Or(bi-serial) 
L00—- 
= 


Pit 


| 
Ad. 


SOSeeeeeeeiii@aal 








i 
| 
| eae 
| 
‘ w 
» 
a 
an 
14 
: 
| 
} 
» a 
| 
" : 
‘LY 
: 
‘ 1 
| 
| 
, 





=e 





278 





Volume II, No. 3 


A NEW TECHNIQUE FOR MACHINE COMPUTATION OF 
COEFFICIENTS OF CORRELATION 
by 
Marc J. Feldstein 
Western Reserve University 
Cleveland, Ohio 


The statistical technicians have de- 
vised various means of reducing the amount 
of work involved in computations by the use 
of calculating machines. Of these means the 
most important is the so-called “continuous 
process." The term, continuous process, 
means performing a given operation upon the 
entire population and obtaining the values 
needed for a final computation of a statis- 
tic without clearing the machine. In the fol- 
lowing pages a description is given of cer- 
tain time-saving techniques which have been 
extensively used in the laboratory at West- 
ern Reserve University since 1930 and which 
have proved a boon to those who must compute 
intercorrelations of several variables in 
populations ranking from 25 to 300. 

The forma for the product-moment co- 
efficient of correlation may be written:} 


In this formula Dj_», designates the differ- 





ence X; -X»p. If 6, = Op = 6, the formla 
becomes 
af 
Di_2 
[gs 
20 
2Di_» 
Since OD, > —— 
2Di_» 
r = - 
18 2No2 


Since the condition that 0, = 0, = 0 
has been imposed, it is necessary to trans- 
form either X,, or Xp so that the distribu- 
tions of the two sets of measures will have 
the same variability. The basis of this 
transformation is the relation 








Whence 


Oe 
Xp = Mp +—(X, - M)) 
0) 
If we arbitrarily set My, = 25 and 0, = 5 and 
call the measures of this distribution Coded 
Standard Scores (CSS)* the formula for trans- 


forming any set of measures into coded stand- 
ard scores is 


5(X, - Mj) 
+ ——————e 
07 


10(X, - M) 


ee 


The machine processes to be described 
may be applied to measures recorded on indi- 
vidual cards or on large data sheets. The 
results of the calculations should be re- 


corded systematically. A recommended form of 
work sheet is shown. The first steps are 


to calculate the mean (M), standard devia- 
tion (o) and 1/20 for each set of measures. 
Using a calculating machine such as the Mon- 
roe or Marchant, the processes are as fol- 
lows: 


1. To find 2X, and 2X} of the raw scores. 

Shift the carriage to the left. 

Lock the "1" key of extreme left colum 
on the keyboard. Punch the first raw score 
of the given variable in the right end of the 
keyboard and multiply it by itself. Clear 
the upper dial only. Punch the second score 
of the same variable and repeat the procedure. 
After all the scores of the given variable 
have been treated the same way, the lower 








1. T. L. Kelley, Statistical Method (New York: 


The Macmillan 
2. This coding of scores is identical in principle with McCall's T 


Company, 1925), pp. 179-180. 
Scores 





varch, 1934 M. J. Feldstein 


WORK SHEET 


Filing No: : Computer: 
Problem: 
Population: 


Variates: 













































































































































































2 TN ae 


- 
: - = ait So. Se, ~ See = 














280 JOURNAL OF EXPERIMENTAL EDUCATION 


dial will show 2X, to the left and zX{ on 
the right end. Copy these numbers on the 
work sheet. Repeat for the other variables. 


2. To find the mean, M). 
Divide zX, by N. Copy the quotient on 
the work sheet. 


3. To find the sigma, 0). 
The formula for the standard deviation 
may be written 


> 
si 

0, = a Mi 
N 


Divide zX; by N, and punch the quotient 
on the extreme left of the keyboard. Shift 
the carriage to the right. Add the quotient 
in the machine and then clear the keyboard 
and the upper dial. Mark off the decimal, if 
any, in the quotient by a pointer. Punch the 
value of the mean on the extreme left of the 
keyboard. Shift the carriage to the left so 
that the first decimal place of the mean is 
under the first decimal place of the quo- 
tient. By using the minus key subtract the 
mean from the quotient M, times. Watch the 
decimal point. Then the number on the lower 
dial will be o%. Extract the square root by 
hand or use a table of squares. Copy the 
value of sigma on the work sheet. 


4. To find 1/20. 

Shift the carriage all the way tc the 
right. Clear the keyboard and all dials. 
Punch "1" in left column of the keyboard, add 
it in the machine and then clear the key- 
board and the upper dial. Punch the value 
of sigma on the extreme left on the keyboard 
and divide the numbers in the usual way. 
Copy the reciprocal of sigma on the work 
sheet. Then divide it by 2 and record the 
quotient. Clear all dials and keyboard. 


5. To fina the Coded Standard Scores. 

Since the Mean (M,) will usually be ex- 
pressed to one or two decimal places the 
formula for calculating the Coded Standard 
Scores (CSS) may be written 











Volume II, No, 2 


v 


2 ino 


C8S = 25 20, (Mi dectnad ” 20, 


l integer’ 


The machine operation to perform these 
calculations is as follows: 

Shift the carriage from the extreme 
left by the number of the figures in the 
mean. Punch the value of the mean on the 
extreme left on the keyboard and add it in 
the machine. Mark off the decimal, if any, 
by a pointer. Clear the upper dial and the 
keyboard. Shift the carriage, so that the 
extreme left column of the keyboard is just 
to the left of the decimal point in the mean, 
Punch the halved reciprocal on the extreme 
right on the keyboard. Place the second 
pointer on the lower dial so that it cor- 
responds in position to the interval between 
the first and the second decimal places of 
the halved reciprocal on the keyboard. Clear 
the keyboard. Then punch 25 on the keyboard 
immediately to the left of this pointer. 
Press the plus (+) key once thereby adding 
25 in the lower dial. Clear the keyboard 
and again punch the halved reciprocal in the 
same position as before. Now the keyboard 
will have only the halved reciprocal and the 
lower dial will show the value of the mean 
at the extreme left and 25 with some zeros 
to the right of it. 

Lock the "1" key of the extreme left 
column of the keyboard and by subtraction 
eliminate the decimals of the mean on the 
lower dial. This accomplishes the subtrac- 
tion of the decimal part of the mean mlti- 
plied by 10/20, from 25.1 The lower dial 
will now show at the left the integer por- 
tion of the mean and at the right the cor- 
responding CSS. As the number at the left is 
changed by additions or subtractions of the 
locked "1" at the left of the keyboard, the 
dial reading at the right will always be the 
corresponding CSS. This number should be 
rounded up to an integer value. The work 
can be facilitated by arranging the individ- 
ual record cards in a sequence of increasing 
or decreasing raw scores. Record the CSS on 
the cards or record sheet. After recording 
the CSS for all items of a given variable 
Clear the machine. As a means of checking 





1. This procedure assumes that the measures (X}) are expressed as integers. If they are not, only the excess of deci- 
mal places in the mean should be eliminated. 





varch, 1934 


the work shift the carriage halfway to the 
right and add up the CSS. Then divide the 
sun by N. The quotient should be 25 or a 
number very close to it. The difference be- 
tween the quotient and 25 gives the average 
precision of the CSS. Ina relatively large 
sample, say about 75, the loss in precision 
should not be greater than 0.05. Similarly 
find the CSS for other variables, Xo, Xs, X, 
oes O08 Baw 

We are now ready to evaluate the right- 
hand member of the formula 


2 
zD 
S ute 


2No* 


Tie 


Since o has been arbitrarily set at 5.00 
this formula becomes 


1 
r,,21- 2D, _2 leony) 


Calculate the reciprocal Go and record it 
on the work sheet. 


6, To find 2D;_9. 

The difference D\.2 = X, - X, but since 
the various sets of CSS have a common mean 
(25) the difference of the raw transformed 
scores may be used. Hence subtract the CSS 
of one variable from the corresponding CSS 
of the other variable. Since the entire 
range of CSS is 50, this subtraction can be 
done by inspection. On some handy table of 
squares find the squares of these differ- 
ences and add them on the machine. In time 
the squares of numbers between 1 and 50 are 
memorized and the table dispensed with. Re- 
cord the 2Df_» on the work sheet. 


7. To find r. 

Shift the carriage to the extreme left. 
Punch the reciprocal of 50 N on the extreme 
right of the keyboard. Set a pointer on the 
lower dial to mark the position of the deci- 
mal point. When zd* is greater than 50 N 
the r is negative, when it is smaller than 
50 N the r is positive. Compute all the 
negative rts first and then all the positive 
r's, 

For the negative r's multiply the re- 
ciprocal of 50 N by the zd®. The number to 
the right of the pointer will give the value 
of r. Record it on the work sheet with a 
negative sign. 


M. J. Feldstein 





281 


For the positive r's, put a unit in the 
lower dial to the left of the pointer and 
then subtract the reciprocal of 50 N, 2d* 
times. The decimals to the right of the 
pointer on the lower dial will give the val- 
ue of positive r. Record it on the work 
sheet. The computation of the negative and 
positive r's should be carried out as a con- 
tinuous process. 

The application of this technique re- 
sults in a quite appreciable saving of time 
in finding intercorrelations. A job analy- 
sis was made on the basis of a population of 
100, with raw scores ranging from 0 to 300. 
The Coded Standard Scores were computed with 
sigma equal to 5 and mean at 25. Ten vari- 
ables were treated, leading to 45 intercor- 
relations. 

The study of time consumed in perform- 
ing all the operations by a clerk with a 
limited experience in statistical work yield- 
ed the following results: 


Finding of ten means and 

ten sigmas 

Finding of CSS, recording 
and checking 

Finding zDf_, for 45 pairs 
Finding of 45 r's by formula 


175 min. 


190 min. 
260 min. 
10 min. 


Total 635 min. 


This job analysis shows that an inex- 
perienced person can compute a series of 45 
intercorrelations with a population of 100, 
in about 14 minutes per correlation. An ex- 
perienced clerk should cut this time to 
about 10 to 12 minutes. It is obvious that 
the saving of time is greater when the nunm- 
ber of variables grows larger. 

This technique offers several advantages. 
The machine calculation of r from Coded 
Standard Scores tends to eliminate the error 
due to coarseness of grouping. The Coded 
Standard Scores, being essentially standard 
scores, allow direct comparison between the 
scores of a given individual, provided the 
distributions of the raw scores do not devi- 
ate markedly from normal. By a simple in- 
spection of a series of CSS recorded on the 
cards one can judge the "profile" of the 
given individual. 














a, 
aye 
* 


re 
ee 





Wy 


Dee 


LESS ae 








282 JOURNAL OF EXPERIMENTAL EDUCATION 


One disadvantage appeared to subtract 
somewhat from the value of this technique. 
There is some loss in precision due to 
rounding up the CSS to integers. This loss, 
however, is practically negligible, since 
the errors tend to balance one another with 








Volume II, No, 3 


the increase in the size of the sample. In 
a relatively large sample these errors have 
only a slight effect on the third decimal 
place of r. In the vast majority of work 
in correlation analysis the precision of two 
places is sufficient. 


~~ Oo ctr © @ I - —s ee 


o.Qo tame cntt Qo & kb me 


Qo oe ort ~*~ @|-} mM Wess —~s * @D OO Df 2D * &* ct O FF TH 


THE CONSTRUCTION AND INTERPRETATION OF 
DIFFERENTIAL ABILITY PATTERNS 
by 
David Segel 
U. S. Office of Education 


The most valid method of determining 
whether or not a person will be successful 
in an occupation or certain line of scholas- 
tic endeavor is for him to try it. If a 
person wants to become an engineer the most 
valid method to find out if he can become 
one is for him to attempt the necessary 
training and if successful at that, to go 
out and attempt the job. This trial and er- 
ror method, however, is costly for both the 
individual and society. For the individual 
it is a costly procedure because he may 
spend much time in trying out various lines 
of work and in failing at jobs he is actual- 
ly unfitted for, he may conclude that he is 
unfitted for all other occupations. It is 
costly for society because it may spend time 
and effort in educating individuals to tasks 


which even if they can be taught to perforn, 
will do so on a very low level of efficien- 


cy. This is particularly true where the in- 
dividual is educated at public expense. 

A person who has failed during the 
preparation for one occupation often makes 
no attempt to prepare himself for another 
occupation. This is especially true in 
those lines requiring considerable school- 
ing. Upon failing in the preparation for or 
in the practice of one occupation, quite 
often the individual and sometimes his ad- 
visors feel that the general level of the 
occupation was too high for him. The trag- 
edy of this point of view lies in the real- 
ization that success may have been attained 
in some other occupation on the same level 
into which a shift could have been accom- 
plished without much loss of the preparation 
made for the other occupation. The earlier 
Stages of preparation for occupations on the 
Same level are often much the same. By shift- 
ing from one type of preparation to another 
there would often be only a small loss to 


the individual and also, therefore, to soci- 
ety. 








What is needed in guidance, therefore, 
is a method of determining the relative 
probabilities of success in different occu- 
pations or scholastic areas. At the end 
of various stages of secondary and higher 
education, it should be decided, naturally, 
whether or not a student should attempt a 
higher level of education. For instance, by 
the end of the junior high school level the 
decision must be made as to whether or not 
the student should look forward to several 
years in high school and junior college or 
college work or whether the high-school 
course should be the terminal course for 
him. Such a decision can be made in part 
on the basis of tests measuring general ap- 
titude. If the decision is that high school 
is to be a stepping-stone to junior college 
or college, the particular course may be 
fairly safely left to the interests of the 
student as long as college entrance require- 
ments are being met. On the other hand, 
however, if it is decided that the high- 
school course shall be the terminal course 
for a student, then it becomes a question as 
to what subjects or what course should be 
pursued. The decision, in the case of a 
boy, might lie between a great number of 
commercial, mechanic arts, music, art, lit- 
erary and other courses. General prediction 
does not aid to any extent in this problem 
of specialization. If during the junior 
high-school course tests of various kinds 
were taken and various subjects studied, an 
indication of the direction of the talent of 
a student may be seen. The ascertainment of 
this trend of talent we believe to be an im 
portant part of the work of those who coun- 
sel students. 

The most accurate determination of po- 
tential differential ability from evidence 
is through certain statistical methods, The 
value of a simple prognostic measure is in- 
dicated by the simple correlation between 








Cp nie ET 


a 





F 

tp 
b 

¥ 


* 





284 JOURNAL OF EXPERIMENTAL EDUCATION Volume II, No, 2 


the predictive measure and the criterion of 
success. If several prognostic measures are 
used in combination the best weight for each 
must be found and the multiple correlation 
resulting from such a combination be deter- 
mined. The method for finding the best 
weights for a combination of predictive 
items is through the use of the regression 
equation. The actual prediction is also 
made through the regression equation.} 

Although it is true that the applica- 
tion of the regression equation technique 
gives the most accurate results for prog- 
nosis, for practical purposes certain other 
methods may be found valuable. An important 
method in meeting the problem of practical 
prediction has been to graph objective data 
for each individual and by inspection of 
this graph through noting the score on each 
test and the general pattern in relation to 
some general standard, arriving at some 
judgment concerning the future success of 
the individual. Such a method is illus- 
trated by the Practical Prediction Chart 
described by Brintle,* on the secondary 
level. We have developed a method for show- 
ing graphically test results from which, by 
inspection, a rough determination of differ- 
ential success prediction may be made, 

The data used to illustrate this method 
were found in the records of 100 boys who 
had filled out the Strong Interest Blank *® 
and had taken the Iowa High School Content 
Examination’ at the beginning of their col- 
lege career. For our illustration we will 
use the scores::on the five interest ratings 
on the Strong Interest Blank which proved to 
have the greater values for differential 
prediction, These interest ratings were 
those of engineering, medicine, life insur- 
ance salesmanship, personnel management, and 
purchasing agent. Of the four achievement 
test scores obtainable on the Iowa High 
School Content Examination, we have used 
two, namely, English literature and mathe- 
matics. 








Chart I shows the general differentia) 
ability pattern of Strong interest scoras 
indicating a difference in achievement ji, 
English literature and mathematics as meas. 
ured by the first two parts of the Iowa 
test. In this chart the solid line repre- 
sents the average (mean) score on five of 
the interest ratings of those students w 
did better in the English literature sec- 
tion of the Iowa test than they did on the 
mathematics section of the same test. Since 
the scores on the English literature tests 
and the mathematics tests are not directly 
comparable the scores on each of the tes 
were reduced to percentiles based on +¢ 
group of 100 boys. Those students whose 
percentile scores were higher for Englis! 
literature than for mathematics were there- 
upon judged to be superior in English liter- 
ature. Similarly the other (broken) line 
of the chart represents average interest 
scores for those students who had higher 
percentile scores in mathematics than 
English literature. This chart thus gives 
the picture of the differences in intere: 
scores obtained by students who do better 
in one subject of a pair of subjects than 
in the other. (See page 285 for Chart 

This means, in general, that’ those 
students who have high interest scores 
engineering, medicine, and purchasing ag: 
and low interest scores in life insurance 
salesmanship and personnel management w! 
do better in mathematics than in English 
literature. 

In Chart II we have drawn ina thir 
line on the general differential pattern 
shown in Chart I. This third line repre- 
sents the score of an individual student ! 
these five interests. He made scores as 
follows: Engineering, A; medicine, B; life 
insurance (salesmanship), B; personnel 
management, C; and purchasing agent, B. 
By inspection the pattern of interests of 
this individual may be compared with the 
general patterns. The pattern of this 





1. For the statistical techniques developed for the accurate prediction of the differential success of an individual, 


see: David Segel, "Differential Prediction of Ability as represented by College Subject Groups", Journal of Educe- 


tional Research, XXV, (January and February, 1952). 





2. S. L. Brintle, "A Practical Prediction and Guidance Chart", Junior College Journal, III (March, 1933). 





5. The Strong Interest Blank is a questionnaire designed to get the interest in various occupations of a professions! 


or semi-professional nature. The questionnaire is scored separately for each occupation. It is published by the 
Stanford University Press. 


The Iowa High School Content Examination measures four aspects of high-school achievement. The two with which rr 
It is published by the University of Iowa. 


4 


have to do here are Pnglish literature and mathematics. 





David Segel 





CHART I.--DIFFERENTIAL ABILITY PATTERN FOR ACHIEVEMENT 
ENGLISH LITERATURE AND MATHEMATICS 





Engineering 





Medicine 


in 








Life 7 


7 





Insurance 


Interest 


Personnel 





Management 





Purchasing 


t 
\ 
‘ 
\ 
\ 
A 

















Agent 





———Studente Who Do Better in English Literature Than in Mathematics 
== Students Who Do Better in Mathematics Than in English Literature 


, 








student's interests is most like that of 
those students who will do better in mathe- 
matics than in English literature. It is 
only in life insurance salesmanship that he 
departs materially from this general plan. 
An inspection of this pattern should aid 
the counselor of this particular student in 
giving him advice. (See page 286 for 

Chart II.) 

Similarly the pattern of interests of 
other individual students may be compared 
with these general patterns and thereby a 
differential predication made. The method 
illustrated is not restricted, of course, 
to interest scores. Any kind of objective 
data which you have reason to believe has 
predictive value may be put in this form, 
In the present study our criterion data 
were obtained at the same time as our orig- 
inal measures. This causes a higher corre- 
lation than is the case when success is 
measured after a period of time. This 
limitation of our present study does not 
affect the techniques described which are 





applicable in any case. By the use of such 
patterns as has been described test scores 
and ratings may be utilized in showing po- 
tential differential ability. 

Let us assume for purposes of further 
illustration that a high school.counselor 
wishes toset up differential patterns for 
use in distinguishing between relative po- 
tential success in salesmanship, stenography, 
mechanic arts, and art for use with entering 
high school students in aiding those whose 
education will end with the high school pe- 
riod. Let us assume further that scores 
on tests A, B, and C have been shown to be 
efficient in predicting success in these 
four high-school courses. The procedure 
will then be as follows: 

(1) Assemble the test data and the 
marks made on subjects in the form of sub- 
ject fields mentioned above for 100 or more 
students. This requires that a period of 
time elapse after the giving of the tests 
so that marks in the different subjects will 
be available on the students who took the 
tests. 

















LTP 4-8043 


286 


JOURNAL OF EXPERIMENTAL EDUCATION 


Lorraine 





Volume II, No. 2 





CHART II.—-A COMPARISON OF AN INDIVIDUAL STUDENT'S PATTERW 
WITH THE GENERAL DIFFERENTIAL PATTERN FOR ACHIEVEMENT 
IN ENGLISH LITERATURE AND MATHEMATICS 








Engineering 


Medicine 





in 


Life 
Insurance 


Personnel 
Management 


Interest 





Purchasing 





























Agent 





—<= Students Who Do Better in English Literature Than in Mathematics 
— <= Students Who Do Better in Mathematics Than in English Literature 
=e== One Student's Pattern of Interests 








(2) Assemble together all scores on 
test A for pupils who have done better in 
salesmanship courses than in stenographic 
courses, Then assemble all scores on test 
A for pupils who have done better in the 
stenography course than in salesmanship 
courses, 

(3) Assemble as in Number 2 the scores 
on tests B and C. 

(4) Average the test scores for the 
list of scores in (2) and (3). 

(5) Plot these averages in a single 
graph. Two lines will result; one will 
show the scores on those pupils who do bet- 
ter in salesmanship courses than in steno- 
graphic courses, while the other will show 
the scores of those pupils who do better in 
stenographic courses than in salesmanship 
courses, 

(6) Follow steps (2) to (5) inclusive 
for each of the other pairs of subjects. 





(7) These patterns are now ready to be 
used for predictive purposes. By drawing 
in the pattern of the scores made by one 
student in each of the pairs of subjects his 
potentiality in regard to each of the pairs 
of subjects will be revealed and the accura- 
cy of the guidance will be increased. Once 
patterns are found to exist under varying 
school conditions it will be unnecessary to 
remake patterns for different school sys- 
tems. 

Up to this point the method of con- 
struction and the possible uses of differ- 
ential ability patterns have been discussed. 
We wish to show also that this method of 
constructing differential ability patterns 
which has been described gives results in 
accordance with the more vigorous methods 
of differential prediction. 

For this, let us examine Chart III. 
Here beside the differential pattern, is 


David Segel 





CHART III.--RELATION OF DIFFERENTIAL ABILITY PATTERNS AND DIFFERENTIAL 
COEFFICIENTS OF CORRELATION AND REGRESSION EQUATION COEFFICIENTS 





Engineering 


For Each Interest 





Differential 
Correlation 
A Coefficient 


Regression 
Equation 
Coefficient 








Medicine 


--55 -3.135 





iin 


4 
of 





Interest 


Personnel 




















| 
\ 
‘ 
\ 
\ 
\ 
~ 


Management » 














Purchasing 











Agent 


————=e= Students Who Do Better in English Literature Than in Mathematics 


—e«— Students Who Do Better in Mathematics Than in English Literature 


-.44 








| 





given, first, the differential correlation 
coefficient for each interest. These coef- 
ficients have been calculated between each 
of the interests and the difference between 
success in English literature and mathe- 
matics in favor of English literature. 
coefficient of -.55 therefore indicates 
that interest in engineerine is correlated 
negatively with success in English litera- 
ture compared with success in mathematics. 
This is in accord with the pattern in the 
chart since the line indicating comparative- 
ly greater success in English literature is 
associated with the lower interest in en- 
gineering. Similarly for the other corre- 
lation coefficients, as the lines cross in 
the differential pattern the sign of the 
correlation coefficients also changes. This 


is in accordance with the meaning of the 
lines, 


The 








The regression equation which gives the 
estimated difference score between English 
literature and mathematics is as fo.ilows: 


X, = -3.135 X, -1.94 Xp + .66 X;, 


+ 1.29 X, -5.42 X, + C 


Where X, is the estimated difference, and X, 
X,, and etc., are respectively the scores in 
the five interests, the regression coeffi- 
cients are also placed on the chart. The 
Signs of these coefficients are the same as 
those of the correlation coefficients and 
are therefore also in agreement with the 
meaning of the lines in the chart. These 
relationships show that the general trends 
of the differential pattern in the chart 

are valid and may be therefore used for 
guidance on an inspectional basis. 








pi nee tee ase OP TS. 


= 


= 








2 seme 


288 





Volume II, No, 2 


THE INSIGNIFICANCE OF SIGNIFICANT DIFFERENCES 


by 


Edward A. 


Lincoln 


University of California 
Berkeley, California 


Very early in the history of statistics 
the workers discovered the possibility of 
using the ratio between a measure and its 
probable or standard error as a means of es- 
timating the significance of the obtained 
measure. It is reported by Dr. Walker inher 
historical studies that, "Gauss, Quetelet, 
Galton, and others were accustomed to use 

deviation 
the fraction | obable error of deviation 
an argument from which to determine the prob- 
ability of the occurrence of an error of a 
particular magnitude." The first use of 
such a ratio, according to Dr. Walker, was 
made by De Moivre in 1738.* 

In the first texts on statistics in 
psychology and education this problem of re- 
liability was not adequately treated. As 
educational and psychological research pro- 
gressed, however, it became clear to the re- 
search workers that the significance of 
measures was important. This question be- 
came very vital when the research worker or 
experimenter was dealing with the differences 
between two groups, as when the two sexes 
were compared, or when the traits or abili- 
ties of several groups were under scrutiny, 
as in a study of race or nationality differ- 
ences. Finally, the technique of comparing 
groups became well established, textbooks 
gave due attention to the problem, and at 
present no study is considered adequate if 
it does not report the reliability of any 
differences which may be found. 

The method of finding the reliability 
of a difference is now too well-known to re- 
quire exposition here. If the measures used 
are uncorrelated, the probable error of a 
difference is given by the formula. 


Pep = \|Paz + Pe? 


as 





The standard error of the difference is ob- 
tained by using 


op = 02 + o? 
If the measures are correlated, the form- 
las become 
PE, = \/PE2 + PE? - or, PE,PE, 
and 


6, = \o2 + of - 2r,, 0,0, 


After the probable or standard error has 
been obtained, the significance of the dif- 
ference is computed by dividing the differ- 
ence by its error, as 


co, Oe 
PEp Op 


A difference is considered significant if it 
is at least four times as large as its prob- 
able error or three times as large as its 
standard error. 

The use and interpretation of probable 
or standard errors of a difference imply 
that the data from which the calculations 
were made constitute a random sample of the 
population for which generalization is being 
attempted. This implied assumption greatly 
limits the determination of statistical sig- 
nificance. Furthermore, the probable or 
standard error of a difference does not take 
into account systematic errors of measure- 
ment in the data, or of the possibility of 
errors of validity. In some types of inves- 
tigation there may be still other limita- 
tions of the data that are not accounted for. 
Even when the usual interpretation of statis- 
tical significance is justified, the ques- 
tion of the practical significance of a 
difference may be raised. The term 





1. Helen M. Walker, Studies in the History of Statistical Method (Baltimore: Williams and Wilkins, 1929), p. 179. 





2. Ibid, p. 180. 





March, 1934 


"statistical significance” has a mathemati- 
cal meaning. There is need for a practical 
meaning. It is the purpose of this paper to 
consider the problem of statistically sig- 
nificant differences in the light of what 
can be learned from the overlapping of the 
groups, and to contrast what is revealed 
with the usual interpretation of "statisti- 
cally significant." 

For purposes of analysis, two recently 
published studies have been chosen which 
geem to represent very well the typical in- 
vestigation and report. It should be under- 
stood by the reader that scores of other 
studies might have been selected, and that 
the choice of these two does not imply that 
they are inferior in any way. In fact, the 
writer considers both of them to be excel- 
lent. 

First, let us turn to a report on the 
efficacy of the sound motion picture in 
teaching science. In Rulon's experiment two 
equated groups were taught from a specially 
written textbook. The Film Group were taught 
with the aid of a specially prepared sound 
picture; the Control Group were not. 

In Table I certain of Rulon's results 
are presented. The ratios of the differences 
to their standard errors 6.6, 3.9 and 5.6, 
are clearly significant. In fact, the chan- 
ces are well over 99 in 100 that the true 
difference is greater than zero, and it is 
practically certain that the children learned 


more physiography and biology when the sound 
film was used. 


TABLE I 


RESULTS OF RULON'S SOUND FILM EXPERIMENT 








Test Control Film o 


Mean| o |Mean| o D 





32.6 
48.3 


80.9 


19.0 
25.4 


42.2 


Phy siography 
Biology 


37.7 
52.2 


930.0 


18.3 
24.4 


40.1 


78 
1.00 


Total Score 1.61 





























But now comes the practical question. 
Should superintendents and school boards 
hasten to equip their science classrooms 
with sound film apparatus? Let us look at 
the data in another way before we decide. 

Consider the overlapping of the two 


E. A. Lincoln 





289 


groups in total scores. The difference be- 
tween the two means is 9.1 points. The 
standard deviation of the control group is 
42.2. Dividing, we find that the mean of 
the film group is .220 above the mean of 
the control group. From tables of the nor- 
mal probability curve we find that .220 
above the mean cuts off 8.7 per cent of the 
cases, and therefore 41.3 per cent of the 
control group made scores which were above 
the film group mean. 

By the same method we find that 40.9 
per cent of the film group made scores which 
were below the mean of the control group. 

Turning to the separate parts of the 
test, similar computations show that in 
Physiography, 39.4 per cent of the control 
group scored above the film group mean, and 
37.8 per cent of the film group scored be- 
low the mean of the control group. In Biol- 
ogy, 44.0 per cent of the control group did 
better than the mean of the film group, and 
43.6 per cent of the film group could not 
surpass the mean of the controls. 

In short, the overlapping of the two 
groups is so great that the significance of 
the difference between the means becomes 
largely overshadowed by the individual dif- 
ferences within the groups. Where an in- 
dividual pupil is concerned there are at 
least 4 chances in 10 that he will learn as 
well without the sound film as with it. With 
these facts in mind, the practical schoolman 
will certainly hesitate before spending 
money for sound film equipment. 

A second series of data for considera- 
tion comes from a study by Durrell.* In 
Table II will be found the means, differ- 
ences, probable errors, and critical ratios 
taken from Durrell's article, and the per- 
centages of overlapping computed by the pres- 
ent writer. The standard deviations used in 
the computations were not presented in the 
article, but were kindly supplied by Dr. 
Durrell. The means in the first column are 
means Of I.Q.'s obtained from individual 
Stanford-Binet tests. Those in the second 
column were obtained from group examinations. 
The group examinations used were Haggerty's 
Delta II for the groups represented in rows 
1, 2 and 4; while the Otis Self-Administer- 
ing Intermediate Examination was used for the 








1. P, J. Rulon, The Sound Motion Picture in emce Teaching, Harvard Studies in Education, XX, (Cambridge, Massa- 
chusetts: vers ’ 


2. D. D. Durrell, "The Influence of Reading ability on Intelligence Measures", 


(September, 1935), p. 412. 


Journal of Educational Psychology, XXIV 














290 


groups in rows 3 and 5. The first column 
under the heading “Overlapping” shows the 
per cent of individual 1.Q.'s which equal 
or surpass the mean group I[.Q., and the sec- 
ond column shows the percentage of group 
1.Q.'s below the mean of the corresponding 
Binet I.Q. distribution. 


TABLE II 


DATA FROM DURRELL'S STUDY 


























ened Mean D PE D Overlapping 
I.@.| 1.0 "D| PEp 1-2 | 2-1 

A Bie. Ge Tee SF 
1} 92.09] 108.92/16.83/1.19|14.2 |15.2%| 20.1% 
2] 93.7 |114.5 |20.8 |2.92| 7.13)10.3 | 15.39 
3| 96.2 |106.9 |10.0 |1.77| 5.65) 26.4 | 31.6 
4| 106.7 | 98.6 | 8.1 |1.75| 4.63)29.8 | 31.9 
‘5| 105.5 | 99.9 | 5.6 |2.05| 2.75|36.7 38.6 











An examination of this table will re- 
veal that even when there is a critical ra- 
tio of 4 or 5 there may be an overlapping of 
as much as thirty per cent. This is a mat- 
ter worthy of practical consideration, es- 
pecially, again, where an individual pupil 
may be concerned. 

A very surprising fact shown by the ta- 
ble is that the overlapping is not always 
proportional to the size of the critical ra- 
tio. In row 2, where the critical ratio is 
7.13, the overlapping is actually less than 
in row 1, where the critical ratio is twice 
as great. 

The writer's point may appear more 
Clearly from an examination of Figure 1. This 
shows the state of affairs when the portion 
of one group reaching or exceeding the aver- 
age of the other is 25 per cent.! It is ap- 
parent from this diagram that, whatever the 
statistical significance of the difference 


JOURNAL OF EXPERIMENTAL EDUCATION 








Volume II, No. 3 


may be, the groups are very little different 
from a practical point of view. Probably a 


difference is not of much practical value 
unless at least 90 per cent of one group ex- 
ceed the average of the other. 








Figure 1. The amount of difference 
between two groups when 25 per cent of 
one group equals or exceeds the mean of 
the other group. 


The implication of these considerations 
seems to be that the present ordinary 
method of studying and reporting the sig- 
nificance of differences is, in certain in- 
stances, at least, quite inadequate. This 
is especially true when practical questions 
of school procedure are under considera- 
tion, for a difference that is statistical- 
ly significant may turn out to be relative- 
ly unimportant when the facts of overlap- 
ping are considered. Research workers 
would do well to supplement their critical 
ratios with studies of overlapping, and 
thus reveal the practical insignificance 


of many statistically significant differ- 
ences. 





1. A similar figure and others showing the amounts of difference for other degrees of overlapping will be found in E. L. 


Thorndike, Educational Psychology: 
344, 


Briefer Course (New York, Teachers College, Columbia University, 1915), pp. 342- 


PREDICTING STABILIZED SALARY-SCHEDULE COSTS 


by 


— 


Douglas E. Scates, Director 
Bureau of School Research 
Cincinnati, Ohio 


At a time of economic stress such as 
the present, when school systems have gen- 
erally been forced to consider some form of 
reduction in personnel expenditures, it is 
particularly disturbing to a superintendent 
to realize that the salary schedule which 
is in force calls for an increasing outlay 
rear after year for an unknown period of 
It is quite natural that superinten- 

ts and boards of education should desire 
know how many years large increases in 
the payroll will continue. Is the end so 
near that giving up the struggle would be 
becoming, or is it so distant that the 
effort to keep the schedule alive would be 
1 futile one? 

The answer to such questions calls for 

Wwledge of how much the payroll will 
ncrease each year owing to provisions in 
the schedule, until the new schedule ceases 

be a dominating factor in the annual pay- 
ll changes. There will normally be cer- 
increases in payroll due to expansions 
the staff necessitated by a growing pupil 
population; but these are relatively inde- 
ndent of the schedule and would occur even 
if there were no progressive schedule. We 
‘e not here concerned with methods of es- 
timating these increases, but will note 
merely that they are additional to changes 
traceable directly to the schedule. The 
chief problem is that of estimating the ef- 
fects of the new schedule; and when changes 
aused by the introduction of the new 
schedule have ceased, we may regard the dis- 
tribution of salaries as stabilized. The 
problem therefore becomes one of estimating 
the number of years required for stabiliza- 
tion to take place, and the average salary 
which this will occur. 

Stabilization of teachers! salaries 
under a new plan results when the total an- 
nual saving in salaries occasioned by the 


+ime 
LINC « 





| place them. 
| given recognition on the schedule for a year 





replacement of teachers is equal to the 
total of the annual increments of the 
teachers who remain in the service. Normal- 
ly this will occur when a sufficient number 
of teachers are at their maxima, and do not 
receive further increments, so that the 
increments received by the remaining teach- 
ers will not exceed the amount saved by 
hiring new teachers at low salaries tuo re- 
place teachers with higher salaries who 
leave the system. This condition may be il- 
lustrated by considering a hypothetical sit- 
uation in which the salary schedule ranges 
from a minimum of $1400 to a maximm of 
$2,000 with six equal increments of $100, 
each. Forty teachers leave the system each 


| year and 40 new teachers are employed to re- 


Five of these new teachers are 


or two or service elsewhere. It is as- 
sumed that increments are automatic. The 
figures of the accompanying table (Table I 
on the following page) used do not have any 
connection with the actual situation in the 
Cincinnati Public Schools, nor are they in- 
tended to carry any implication as to what 
is desirable. It should also be understood 
that we are regarding stabilization from the 
point of view of a new salary schedule only; 
other large conditioning factors will appear 
every few years to disturb the situation 
after the introduction of the new schedule 
has ceased to be an effective disturbing 
influence, 

1. Salary. Each salary level repre- 
sents one increment above the salary level 
below. When increments are automatic and 
annual, this means that each level repre- 
sents one more year of service than the 
level below it. 

2. Teachers. This is the distribution 
of teachers who are assumed to be in service 
during the year, showing the number of 
teachers who receive each salary. 











292 


JOURNAL OF EXPERIMENTAL EDUCATION 





TABLE I 


NUMERICAL ILLUSTRATION OF THE CONDI- 
TIONS NECESSARY FOR STABILIZATION 
























































1 2 3 4 5 6 7 8 
Salary Teachers Leave Save Remain New Level Employ Teachers 
$2,000 80 8 $4,800 72 80 eos 80 

1,900 8 0 0 8 8 ere 8 
1,800 11 3 1,200 8 ll — 11 
1,700 16 5 1,500 1l 16 ove 16 
1,600 23 7 1,400 16 21 2 23 
1,500 29 8 800 21 26 3 29 
1,400 35 os 0 26 oa 35 35 
202 40 $9,700 162 1 162 40 202 
3. Leave. The number of teachers who 7. Employ. The teachers who leave the 


are assumed to leave the system each year 
at the various salary levels. 

4. Save. The amount of money that is 
saved at each level when the teachers who 
leave the system are replaced by new teach- 
ers employed at the minimum of the schedule 
($1400). This total of $9,700 must be re- 
duced somewhat because some of the new 
teachers receive salaries above $1400 (see 
colum 7). The aggregate salJary paid these 
five teachers who are not employed at the 
minimum salary is $700 more than it would be 
if they were employed at the minimum, Hence 
the net saving on replacements is $9,000. 

5. Remain. This is the distribution of 
teachers who remain in service for the fol- 
lowing year. All of these except those al- 
ready at the maximum will receive increments 
(in accordance with the assumption that in- 
crements are automatic; if they were not, 
more detailed calculations would be required 
to determine which teachers would advance). 
Ninety teachers are not at the maximum and 
will receive increments which aggregate 
$9,000, Note that this amount equals the 
amount saved by replacing the teachers who 
leave the system (colum 4). This equality 
is a necessary condition of stabilization. 
When all increments are equal, as they are 
in this schedule, the equality can be just 
as well expressed in terms of increments 
(i.e., 90 increments) instead of in terms 
of dollars. 

6. New Level. This columm gives the 
distribution for the succeeding year for 
those teachers who were in the system the 
preceding year and who have been advanced 
one increment. 





service are replaced each year by an equal 
number. Thirty-five of these are employed 
at the minimum salary and five others 
employed at a higher level owing to previ- 
ous service elsewhere which is recognized, 

8. Teachers. This is the distributia 
of teachers for the succeeding year. The 
cycle has been completed, and the distribv- 
tion is the same as that of the preceding 
year. This indicates that stabilization 
has taken place. Under practical condi- 
tions, perfect stabilization probably never 
occurs. Theoretical stabilization as indi- 
cated here may, however, represent a tenden- 
cy around which variations in a practical 
situation tend to average for at least a 
few years, 

It is of interest to observe that 
stabilization is a necessary outcome even 
with a schedule which does not contain any 
stated maximum, because the tenure of 
teachers has natural limits, and no teacher 
can receive increments indefinitely. 
only requirements for stabilization are 
that the schedule specify the amount of the 
increments, and that conditions be reason- 
ably regular. It is, of course, possible 
under field conditions for changes in the 
staff to be so irregular or erratic, owing 
to changes in administrative policies, eco- 
nomic cycles, etc., that large variations 
up and down will occur in the annual tote! 
of salaries; but under ordinary conditions 
there will likely come at least a short pe- 
riod when the distribution of salaries has 
become stabilized, and, except for expan- 
sion, the total outlay for personnel from 


are 


ine 


warch, 1934 


one year to the next will remain relatively 
stationary. 

A stabilized distribution of teachers! 
salaries does not mean the maximum payroll 
nossible under a given salary schedule. This 
maximum may readily be calculated by mlti- 
plying the maximum salary provided in the 
schedule (assuming there is a maximum) by 
the number of teachers in service or likely 
to be at any given time. Stabilization may, 
theoretically at least, occur on the basis 
of any salary level from the lowest in the 
schedule to the highest. For example, all 
of the teachers might leave the service each 
year and be replaced annually by teachers at 
the minimum salary; or, at the other extreme, 
all teachers in the system might be at the 
maximum salary and all remain in service 
over a period of years. Both situations 
represent stabilization since the saving on 
replacements and the total of increments 
are both zero, and hence equal; but the 
first represents stabilization on the level 
of the minimum salary and the second repre- 
sents stabilization on the level of the 
maximum salary. Stabilization will normal- 


ly take place on the basis of a salary level 


nearer the average salary provided by the 
schedule, the exact point depending primari- 
ly on the turnover rate. If the turnover is 
high, the total payroll will be low, and if 
the turnover is low, the payroll will be 
high; the payroll may vary between the lim 
its represented by the two extreme situa- 
tions mentioned, 

It should be noted further that stabi- 
lization of the salary distribution does 
not necessarily imply any permanence in the 
resulting payroll total; technically all 
that it means is that this total will remain 
constant, except for expansion in the sys- 
tem, so long as the rates of change in the 
teaching staff remain constant. Practically 
the term connotes a reasonable stability in 
the factors which yield the required condi- 
tions, so that there’will be some probabili- 
ty of the total remaining constant for a 
long enough time to be administratively sig- 
nificant. On this basis the extreme condi- 
tions previously referred to are somewhat 
repugnant to the sense of the term, since 
all extremes are of necessity inherently 
unstable; but they must nevertheless be 


Douglas E. Scates 





293 


recognized as legitimate limits of varia- 
tion. As a practical metter any stability 
is of limited duration, owing to the chang- 
es in conditions, either internal or exter- 
nal, which are too complex and too sensi- 
tive to remain even approximately constant 
for any great period of time. Stabiliza- 
tion of the salary distribution means mere- 
ly that subsequent fluctuations of the pay- 
roll, or shifts of the total to entirely 
new levels, are due primarily to changes in 
the rate of expansion and turnover in the 
teaching staff rather than to the inaugura- 
tion of the new salary schedule. 

Preliminary estimates of the stabilized 
cost of a new salary schedule are called for 
when the schedule is being considered for 


| adoption, and new estimates are usually de- 


sired after the schedule has been in effect 
for a few years in order to see whether the 
payroll is approaching the total predicted. 
Estimation in advance of any experience 
with a particular schedule is, of course, 
precarious, and the methods which are ap- 
plicable are strictly limited. In the fol- 
lowing discussion six methods for making 
estimates are reviewed, some of which are 
applicable either to preliminary or to ad- 
vance predictions; the discussion however, 
is based on the assumption that a schedule 
has been in effect for a few years, since 
this situation affords a larger number of 
attacks, and the work is more complex, if 
not more difficult. From time to time ref- 
erence will be made to the Cincinnati situ- 
ation, where such a study has been under 
way. 

Perhaps the first method of estimation 
that will occur to the statistically minded 
is that of fitting a curve to the salary 
totals for the years during which the 
schedule has been in effect, and projecting 
this curve to a maximum value. But under 
practical conditions this method will prob- 
ably be quickly abandoned. Present salary 
schedules have not generally been in use 
long enough to yield a set of points suffi- 
cient in number and definitely characteris- 
tic in pattern to indicate with any reli- 
ability the value sought. For the first 
five or six years of a new schedule the an- 
nual totals may present a trend that is al- 
most linear. To estimate a maximm level 








a aa A aa 
ree ee os gana eS Fe i So 


294 JOURNAL OF EXPERIMENTAL EDUCATION 


from data that do not outline definitely the 
approach to that value is precarious. 

Further, the method of introducing new 
salary schedules may cause such marked ir- 
regularities in the distribution of teach- 
ers' salaries that even if the data pointed 
to a true maximum in the curve this might be 
higher or lower than the typical level for a 
stable distribution. It should be noted in 
passing that if this method is used, the ef- 
fect of expansion, or secular trend as_ the 
economists are likely to term it, must be 
removed with considerable care; it is likely 
to be anything but linear under the influ- 
ence of such conditions as have prevailed in 
recent years, and so cannot be removed in 
the usual mathematical manner. A curve fit 
to annual totals may be of some interest as 
a gross check on the result of more refined 
calculations, but it is not likely under 
the existing conditions to give the desired 
answer with much dependability. 

As a second possible method of estima- 
tion, it has been proposed that the number 
of years required for a beginning teacher 
to reach the maximum scheduled salary sets 
a maximum limit to the number of years re- 
quired for stabilization. That is, if a 
schedule provides ten annual increments, 
then at the expiration of ten years' time a 
complete “schedule cycle" will have elapsed, 
and all teachers in the system when the 
schedule was inaugurated will have had an 
opportunity to reach the maximm, and hence 
by this time the distribution of salaries 
should be stabilized. Further, it is argued, 
the necessary period will be reduced by the 
advance on the schedule represented by the 
average salary granted teachers in service 
when the new schedule is inaugurated. 

This "schedule cycle" concept is theo- 
retically correct, bt in practice there are 
a number of reasons why it does not prove to 
be reliable. The first and most obvious one 
is that, wherever increments are not entire- 
ly automatic, teachers may not progress 
through the schedule so rapidly. In Cin- 
cinnati, for example, where increments are 
virtually automatic, there are intermediate 
maxima, and various other conditions which 
retard the progress of those who have been 
in the system for a long time and who do not 
have the training expected in the present 








Volume II, No, 2 


schedule for the new teachers. A second 
interference with the perfect operation of 
the schedule cycle is that the length of 
service distribution of those who drop out 
is likely to be significantly altered for a 
few years after the introduction of a ney 
schedule, after which this distribution may 
tend to settle again. The stabilization of 
the payroll total will not occur until a 
full schedule cycle has elapsed subsequent 
to any large shifts in the distribution of 
those who leave the system, 

There are several other reasons why 
this method is not satisfactory. The pay- 
roll can theoretically stabilize at ~uch a 
point that no teacher will reach th. axi- 
mum salary (on account of teachers dropping 
out too quickly), and under such circum 
stances stabilization might take place in 
a year or two. A fourth objection is that 
this method does not give any idea of the 
total payroll during the intervening period; 
it is known that the total may rise above 
the stabilization level, and that it may 
rise slowly or rapidly in the first few 
years. A fifth difficulty is that, if the 
system is expanding rapidly, it may be en- 
tirely misleading to credit the number of 
years of advance represented by the teach- 
ers in service at the introduction of the 
new schedule, 

A third possible method consists in 
drawing general inferences from the present 
distribution of salaries. If this distri- 
bution shows a very large bunching of sal- 
aries at one or two points, owing presumably 
to the method of instituting the new sched- 
ule, the number of years required for these 
modal salaries ‘to reach the maximum may be 
estimated and taken with some reliability 
as the number of years yet required for 
stabilization. The success of this method 
rests on the assumption that the distribu- 
tion of salaries except for these bunchings 
is reasonably typical, and that this dis- 
tribution will support stabilization. When, 
therefore, these high frequencies have been 
removed from the progressing group through 
having reached the maximum, a stable distri- 
bution will, if the assumptions are ful- 
filled, be left. 

For this purpose it may be convenient 
to divide the salary scale into units of 


o> ct wees WE tet 


~~ — - o> 





varch, 1934 


"increment years", indicating the number of 
years required for each salary to reach the 
maximum if it increases at the normal rate. 
If there are intermediate maxima, the dis- 
tribution will have to be divided according 
to the teachers who fall under each maximun, 
and each group studied separately. While 
this method yields only a rough estimate, it 
has the advantage of being convenient; the 
necessary information is easy to get; les- 
sening of the totals of annual increments in 
later years may be detected; and the total 
increase in the payroll up to the time of 
stabilization may be estimated and added on 
to the present distribution. This has to 

be corrected by an estimated amount for ex- 
pansion. A distribution of salaries of 
those who stop teaching is of value in in- 
terpreting the distribution more correctly. 
Several of the objections to the preceding 
method hold to some extent here. The valid- 
ity of the result depends primarily on the 
extent to which the distribution, with the 
larger modes removed, is stable. It may be 
found, for example, that the resulting dis- 
tribution, when studied in the light of the 
turnover distribution, will not provide the 
necessary conditions for stabilization, and 
some modification of the estimates will 

have to be made, 

A fourth, and, in the past, perhaps the 
common method of estimating the future 
of salary schedules, has been to uti-. 
lize successively several rates of change, 
each one applied in turn to the base distri- 
bution, and effective for a given number of 
years. These rates are usually three--the 
rate of salary increase, the rate of saving 
on replacements, and the rate of expansion. 
The rate of salary increase is the amount 
that the total payroll will increase each 
year (because of increments) if the present 
teaching staff remains intact; that is, if 
none drop out and none are added, 

This rate of increase can be secured in 
several ways, one of which is to mltiply 
the salary increment of the individual teach- 
er by the number of teachers not at the max- 
imum. The difficulty with this obviously is 
that if it is used for several years with a 
distribution which is not stabilized it be- 
gins to get seriously in error, owing to the 


most 
cost 


Douglas E. 





fact that each year a larger number of these 


Scates 295 


teachers are likely to be at the maximum, 

If there are several different sizes of in- 
crements, the teachers will have to be di- 
vided into groups accordingly and each 

group treated separately. Another method 

of calculating the rate of increase is to 
study the record and status of each indi- 
vidual teacher, and estimate the salary she 
is likely to be receiving at the end of the 
given period of time. This is of course 
largely a matter of judgment, especially if 
intermediate maxima exist, but it may be 
necessary if the salary schedule is complex. 
If the increases are not automatic, a study 
of the experience with the schedule may 
yield a coefficient to be applied. Such de- 
tailed work is more appropriate to the two 
techniques of estimation to be described 
later. 

The annual saving on replacements is 
the total amount saved by virtue of teach- 
ers who are well along in the schedule leav- 
ing and being replaced by new teachers near 
the beginning of the schedule. The average 
saving on replacement can be calculated by 
finding the average salary of all teachers 
who stop teaching in the system, and sub- 
tracting from this the average salary of 
all new teachers who are hired; the total 
saving on replacement is this difference 
multiplied by the average number of teach- 
ers who stop teaching (or, if the system is 
reducing its staff, by the average number 
of new teachers hired). 

This figure needs added to it one in- 
crement for each teacher who stopped at the 
end of the year and who was not at her max- 
imm, for the first rate used has already 
assumed that her increment is given. Again, 
this total cannot be used cumlatively year 
after year without change with a distribu- 
tion which is not stabilized, for presumably 
a larger portion of the teachers will reach 
their maximum and their salaries will not 
then be increasing any more. With a chang- 
ing distribution this rate really needs a 
redetermination for each year of its use, 
and this is not easy to do in terms of a 
rate by itself. 

Calculation of the third factor, rate 
of expansion in the teaching staff, involves 
more judgment than it does mathematics. One 
will want to know the average increase in 











296 JOURNAL OF EXPERIMENTAL EDUCATION 


the staff for some years previous, and he 
will want to know the average rate of growth 
in the city population and how many new 
teachers this would normally call for. But 
he must also know that in times of finan- 
cial stringency the tendency is to increase 
the teaching load of those already employed, 
and to maintain their salaries insofar as 
possible, rather than take on additional 
employees. This calls for an intimate 
knowledge of the situation more than it does 
for techniques. How heavily burdened is the 
present staff, and what is the general at- 
titude of the administration? Whatever ex- 
pansion rate is used, it is not only cumula- 
tive from year to year, but all additional 
teachers carry annual increments until such 
time as they may reach their maximum, 

A few words will be given in general 
appraisal of this method. It promises more 
dependable results than any of the first 
three methods described. It is analytical 
to some extent, both in method and in re- 
sult, for it utilizes three basic rates 
rather than a single gross rate, and it 
gives separate totals for these three steps 
rather than a single answer. The adminis- 
trator can thus see what part turnover and 
expansion play in building up the total cost 
of the payroll. This method was used, with 
considerable success, for making estimates 
of the cost of the present Cincinnati salary 
schedule when it was being considered, in 
1926.1 

There are, however, certain basic de- 
fects in the method, at least from a theo- 
retical point of view. The rates are all 
applied in turn to the base year, while in 
the actual teaching population the tenden- 
cies and rates of change do not take place 
one at a time for a given number of years, 
but they occur simultaneously and continu- 
ously, so that they are mutually interactive, 
each change affecting the others. Actually 
therefore, we have changing rates acting 
upon changing bases. It is possible to make 
allowance for certain changes in the rates, 
though this is not typical of the method. 
Any abnormality in the base year total will 
be reflected rather directly in the final 








Volume II, No, 2 


estimate. The method does not yield a dis. 
tribution of salaries, but only the severa) 
totals. Finally, the method is not adapted 
primarily to estimating the time when 
stabilization will occur, but is adapted 
preeminently to giving an estimate of the 
cost for any year which is a certain number 
of years in the future. It can do this di- 
rectly, since it is a multiplication meth- 
od, and does not need to be concerned with 
intervening years, Stabilization is as- 
sumed when the estimates for successive 
years are approximately equal. 

There are two further methods whic} 
will be described. These appear to be bet- 
ter adapted to the solution of the problem 
than any of the four which have been re- 
viewed. These methods are based primarily 
upon distributions rather than upon rates, 
and are likely to be more sensitive to 
changes in the various factors affecting 
the real situation. Also, they afford a 
certain satisfaction to the worker by vir- 
tue of the tangibleness of the intermediate 
products, in contrast with the abstractness 
of the rates of change. The writer admits 
to certain misgivings in connection with 
the routine use of several rates which are 
interrelated, and which, both in derivation 
and application, may be influenced spurious- 
ly by factors which are not readily dis- 
cernible. 

The first of these methods utilizes 
distributions of length of service of the 
teaching staff, and converts these into 
salary distributions on the basis of an es- 
tablished or estimated relationship. The 
advantage of using length-of-service dis- 
tributions is that these will presumably be 
less affected by the introduction of a new 
salary schedule than will the distribution 
of salaries, so that predictions may more 
safely be based upon them. This method is 
of course inapplicable where length of 
service is not an important factor in the 
new salary schedule. 

Either the present distribution cf 
length of service, or a constructed dis- 
tribution, may be used. Probably both 
should be studied as checks upon each other. 





1. "The Probable Cost of Putting the Proposed Schedule into Operation", Report of the Committee on the Study of Sal- 
aries in the Cincinnati Public Schools. Sec. VI, (May 24, 1926), pp. 44-52. This section is reproduced in full in 
Willard S. Elsbree, Teachers' Salaries, Contributions to Education (New York: Bureau of Publications, Teachers 


College, Columbia University, 1951), pp. 266-80. 





er cv @ 


yarch, 1934 


Whether the total teaching experience of 
teachers, or the length of service in the 
present system only, is used will depend 
upon which is recognized by the salary 
*~hedule. The conversion of the distribu- 
tion of length of service into a distribu- 
tion of salaries should, ideally, be based 
ypon the regression line (possibly curved) 
for salaries on length of service; not 
every teacher will begin at the minimum of 
the schedule and advance one increment every 
year if the schedule recognizes service in 
yther systems, has conditional increments, 
ind contains intermediate maxima. Practi- 
cally, however, such a regression, if based 


upon an old and very different salary sched- | that is, 


| sonnel is at the sixth step of the schedule, 
| ten per cent of those leaving will also be 

| at the sixth step."1 
ever, 
| large among those with a short service, For 


ule, may be more misleading than helpful, 
and it may be necessary to assume a linear 
relation between service and salary, reduc- 
ing the slope of the regression line some- 
wnat where increments are conditional. Some 
judgment will be called for in determining 


a line that is reasonable under the existing | than three years 


| teen per cent of 
| prise thirty per 


sonditions. 

This distribution of present lengths of 
service may, however, be unsatisfactory,for 
various reasons. The present length of 
service of teachers may be in a state of 
change; the distribution may be the product 
of factors which are not important ones at 
the present; the distribution may not sup- 
port a stabilized distribution of salaries. 
One will probably desire to check upon the 
present situation by constructing a probable 
future distribution of length of service, 
which will have the advantage of represent- 
ing more recent factors. This may be done 
by using the distribution of length of 
service of teachers who have left the serv- 
ice in recent years. The teachers who 
leave the system constitute the "first dif- 
ferences" in the series of frequencies rep- 
resenting those who remain. An average, or 
typical, annual distribution of length of 
service of those who leave can therefore be 
subtracted successively from the distribu- 
tion of those now in service, the years of 
service of those remaining being increased 
and the probable new teachers being added 
between each subtraction. There are short- 
cuts to this work, and there are some prob- 
lems connected with properly introducing the 


Douglas E. 


| the distribution of those who remin 


| distribution of those who leave the 


| of first differences 

| those who remain. 

may reflect a variety of different influ- 
ences, and so may take almost any form them- 





Scates 


teachers who will be hired with initial 
credit for outside service, but these de- 
tails call for more explanation than seems 


warranted here. 


Perhaps it should be pointed out that 
in 
service is not likely to be the same as the 
service. 
It might be assumed that the teachers who 


| leave represent a percentage sample of 
| those who are in the service. 
| writer states, "It is fairly safe to 
| Sume.....that the teachers 


In fact one 
as- 

leaving are dis- 
tributed along the salary scale in exactly 
the same proportion as the total personnel, 
if ten per cent of the total per- 


The turnover, how- 
is likely to be disproportionately 
example, in Cincinnati, teachers with less 
of service constitute fif- 
those who remain, but conm- 
cent of those who drop out. 
Technically, those who drop out consti- 
in a stabilized population, the series 
for the distribution of 
These first difference 


tute, 


selves, producing some other form in the 


| distribution of teachers remaining in serv- 


ice. When the distribution of those who 
stop forms a geometrical series, then the 
distribution of those remaining will take 
on the same shape, logarithmically, and the 
number who drop out at each service level 
will be a constant per cent of those re- 
maining. But unless these conditions are 
fulfilled, this will not be so, and it is 
scarcely to be regarded as typical. 

The distributions of salary which are 
finally estimated from the present distri- 
bution of length of service and from the 
constructed distribution, must be tested 
for stabilization. A stabilized distribu- 
tion must have enough teachers at the maxi- 
mum, or in other groups which do not receive 
increments, so that the saving in salaries 
made possible by the usual turnover will 





1. Willard S. Elsbree, Teachers' Salaries, Contributions to Education, (New York: Bureau of Publications, Teachers 





College, Columbia University, 1931), pp. 152-33. 


equal the increments of the teachers who 





















































































































































































































































































































298 





stay in service. If the distributions se- 
cured are not stable under this test, then 
either a larger per cent of teachers must 
be placed in the "stationary" groups, or the 
rate of turnover and the average saving on 
replacement must be increased. Since these 
changes are inevitable in the real situation, 
the statistician has good grounds for modi- 
fying his distributions so as to conform to 
them. In addition to this, the distribu- 
tions should be modified by the inclusion of 
any special payments which are made outside 
the schedule, such for example, as extra 
compensation for teachers doing special 
work, 

While this method has certain points 
of advantage, there are a number of ele- 
ments of danger, some of which have been 
referred to. If other factors than length 
of service are important in determining the 
distribution of salaries, these must be con- 
sidered. If training is important--that is, 
if it is given a large recognition by the 
schedule, and if in practice teachers enter 
with a wide variety of levels of training-- 
an estimate must be made of the extent to 
which the average level of training in the 
next few years will depart from the average 
level of training of those who have been in 
the system for sometime. If the kind or 
level of work is important, then each level 
must be treated separately. Any unrepresen- 
tativeness in the distribution of length of 
service of those who leave the system will 
be exaggerated in the constructed distribu- 
tion. The conversion of the final distri- 
bution into a distribution of salaries calls 
for the exercise of considerable judement in 
the absence of a satisfactory regression 
equation. Finally, while the method gives 
an estimate of the total salary that’ the 
schedule will call for, it does not yield 
directly the number of years that will be 
required to reach this condition. This fig- 
ure can be estimated from the total and an- 
nual increases. 

The sixth and final method to be pre- 
sented in this paper builds up annual salary 
distributions directly. This is probably 


JOURNAL OF EXPERIMENTAL EDUCATION 





the soundest of all methods, though under 





Volume II, No, 


certain circumstances it may not be superi- 
or to the method just described. The sixth 
method is normally the most sensitive to 
minor variations, and while it must depend 
to a large extent upon the rates and dis- 
tributions which can be secured from experi- 
ence, it does permit of the modification of 
these in the course of estimating if the 
conditions would seem to call for it. This 
method also permits the effects of all of 
the factors to be operative in the forming 
of each annual distribution, so that their 
joint result is used as the base for the 
next year's estimate, thus closely paral- 
leling actual conditions. Further, the in- 
fluence of the existing salary distribution 
is considerably modified by the successive 
changes that are made, so that the charac- 
ter of the final distribution is not unduly 
influenced by abnormalities in the present 
distribution. 

It was this method that was used by 
Whelpton,+ of the Scripps Foundation for Re- 
search in Population Problems, in his. very 
detailed and difficult prediction of the 
population of the United States up until 
1975. Elsbree describes this method in his 
recent treatise on teachers salaries.? It 
has also recently been used in the study of 
the Cleveland teachers salaries.5 The meth- 
od begins with the present distribution of 
salaries, subtracts from this the distribu- 
tion of salaries of those who stop teach- 
ing, adds the increments for the remaining 
teachers who will probably receive them, and 
adds the distribution of salaries of the new 
teachers who will probably be hired, perhaps 
including expansions as well as replace- 
ments. The resulting distribution is then 
used as a base for estimates for the follow- 
ing year and the process is repeated for 
each year as long as desired. In carrying 
forward the work, attention must be given to 
many of the details previously mentioned in 
connection with the other methods, which 
will not be restated here. 

The degree of complexity of this pro 
cedure depends largely on the extent to 
which the increments are, in practice, con- 
ditional. If increments are automatic and 





1. P. K. thelpton, "Population of the United States, 1925 to 
1328), pp. 253-270. 
2. Op. cit., pp. 124-64. 


1975", The American Journal of Sociology, XXXIV (September, 





5. T. C. Holy, and others. Cleveland Teachers' Salaries, Monograph No. 16, (Columbus, Ohio: Bureau of Educational e- 


search, Ohio Stat» Imiversity, 1952), pp. 19-34. 





‘ch, 1934 


Douglas EF. 


> is only one maximum, all salaries ex- 
sept those at the top may be advanced; in 
the tabulation scale for one's fre- 
swency distribution can physically be moved 
ne step for the next year. The pro- 
jure is merely a continuation, for a num 
f years, of the process shown in a sim- 
form in the illustration previously 
ven. If there are several different in- 
rement rates and several maxima, more de- 
iled methods will be called for. The en- 
re process can be broken up into as many 
as there are maxima and increment 
rates if this does not too seriously reduce 
reliability of the several distribu- 
ns through decreasing their populations. 
A more refined method of forming these 
timated annual distributions is to study 
> record of each teacher individually and 
timate the increments she will receive for 
of a series of years. Such a study was 
recently made in Cincinnati for one, two and 
years in advance. At the present time 
tudy is being made which will yield ob- 
jective data on the rates at which teachers 


- conditions of salary, training, age, ex- 
perience, etc. These rates will then be ap- 
plied to small groups of teachers who are 
homogeneous with respect to the significant 

‘tors, in order to build up estimates of 
the salary increments of various groups in 

> teaching population for each of a number 
of years in the future. The different 
rroups will then be combined to form an es- 
timated salary distribution for each of 
these future years, as required. The re- 

will be chetked by certain of the oth- 


sults 


er methods described which can be readily 


ipplied on the basis of the data which are 
in hand. 


SUMMARY 


In estimating the future cost of a 
alary schedule, we are interested, not in 
the maximum possible payroll, but in the 
maximum that the schedule will attain under 
typical conditions. This we may refer to as 
the stabilized salary cost. This may, theo- 


Scates 





299 


retically, occur on the basis of any average 
salary from the minimum in the schedule to 
the maximum; practically, stabilization is 
likely to occur on the basis of an average 
salary somewhat nearer the maximum than the 
minimum. The particular value of the sta- 
bilized average salary for a given schedule 
is determined by rates of change in person- 
nel and in salary, which take place in the 
teaching staff--particularly the rates of 
turnover, expansion, advancement through 
the schedule, and saving on replacement. 

Stabilization on any particular level 
is not to be thought of as permanent; the 
conditions that we regard as normal for one 
period are not normal after that period has 
come to an end, The average salary of 
stabilization is changed Dy changes in the 
rates referred to, and these rates are, in 
turn, changed by variations in more funda- 
mental, economic and professional condi- 
tions. Hence, stabilization for a given 
Salary schedule is to be regarded as endur- 
ing only for the length of a particular pe- 
riod. Boards of education and administra- 
tors can never regard the cost of a_ given 
Salary schedule as permanently fixed. All 
estimates of future cost are necessarily 
conditioned by the possibility of changes 
in the underlying situation. 

Six approaches to the problem of arriv- 
ing at an estimate of what a salary schedule 
will cost when it has been in effect long 
enough for the distribution of salaries to 
become stabilized, have been reviewed. The 
first two were rross, normally unreliable 
methods, one depending upon the projection 
of a curve of totals, the other taking 
stock of the number of years already 
elapsed out of the maximum number of incre- 
ments which the schedule provides. The 
third method consists in studying modal 
frequencies in the present salary distribu- 
tion, in terms of remaining “increment 
years", and the fourth applies, for a given 
period of years, rates of advance, saving 
on replacement, and expansion, to a base 
distribution. The last two methods con- 
struct annual distributions of teaching 
population, the former with respect to 





1. "A Study of the Cost of Teachers 


Salaries in the Cincinnati Public Schools", (June, 1951), 27 pp. 


As implied in 


\he test, the distribution for any year is actually formed from the estimates of individual teachers' salaries for 


that year. 
Secondary rather than an essential feature. 


Hence when this method of estimating advance through the schedule is used, the distributions become a 








a4. ergo, “eS 


Se ; 
ene POURS er 
, aN i ~ 


i 


: 
% 
F 
iad 


= 
Oe 








300 JOURNAL OF EXPERIMENTAL EDUCATION Volume II, No, 


length of service, and the latter with re- 
spect to actual salary. These methods are 
presented somewhat roughly in order of the 
amount of information which is required, and 
in what the writer regards as their sensi- 
tiveness to the many factors which actually 
affect the total payroll. Also, the last 
two methods will yield more information, for 
a distribution tells more than a total, or 
average, alone. 

In considering these methods one may 
desire to bear in mind a general principle: 
the more complex the method the more sensi- 








tive it is capable of being, but, at the 
same time, the greater the number of oppor- 
tunities there are for error. Thus the 
methods which are theoretically better ar 
not always preferable in practice. Unless 
one is prepared to safeguard a complicated 
procedure, he should not undertake to use 
it. When given proper care, however, the 
more highly sensitive processes are capa! 
of greater adaptation to actual conditions, 
and hence of yielding more accurate and 
more trustworthy results. 


"PERSEVERATION" IN A GROUP OF SUBNORMAL CHILDREN 


by 


K. H. Rogers 
University of Toronto 


Introduction 

It has appeared from a previous. study! 
conducted at the Ontario Hospital, Orillia,* 
that the approach through achievement rec- 
ords was not likely to reveal the presence 
of factors other than "g" (intelligence) 
the case of children of low intelligence. 
This, we have contended, is probably due to 
the high degree of relationship between "in- 
telligence"” and "achievement" in this partic- 
ular situation. Since, however, the fields 
of activity represented by the Training 
School programme are limited in scope, a 
search with other records might indicate 
variables in personality that could be in- 
vestigated eventually by the tetrad tech- 
nique.® 


in 


Preliminary Study 

In order that this aspect might be ex- 
plored, six cases were selected for prelimi- 
nary study. The bases of selection were 
such that age and [.Q., were very similar, 
the respective ranges being 15 to 16 years 
and 55 to 61. All cases had spent about the 
same time in the Institution, and were in 
the Upper School. Their etiological and 





one of its major responsibilities. 





1. "Intelligence and Perseveration Related to School Achievement." (5) 
2. The patients of the Ontario Hospital, Orillia, are subnormals. 


Two monographs have been published expressing this point of view. 
pital (Orillia) Publications, Vol. I, No. 1 (1929), No. 2 (1931). 


. The tetrad technique was devised in arder that factors underlying intercorrelations could be isolated. 


physical factors as known were likewise ap- 
proximately the same. 

The available records included develop- 
mental, physical, social, educational and 
psychiatric data. The only aspects wherein 
any marked differentiation could be detected 
pertained to such personality characteristics 
or traits as the following: "lack of initia- 
tive", "hesitant", "listless", "disinterest- 
ed", as against “quarrelsome", "restless", 
"active"; “unruly”, “difficult and unreli- 
able", "obstinate", as against "erratic", 
"careless", "vivacious"; "non-communicative" 
as against "talkative"; "will not play with 
others” as against "plays with others"; "calm 
and deliberate” as against “nervous, sensi- 
tive", and so on. Without subscribing to 
any doctrine of "types", or considering these 


characteristics as related in any precon- 
ceived way, it was felt that these differ- 
ences might be connected with some of the 
variables supposedly represented in scores 


on "personality" tests. "Perseveration" 
measured by Spearman suggested itself (6). 
Other objective devices, such as Downey's 
Test of Will-Temperament, were also used. 
When the indications of "“"perseverative 


as 


This institution has accepted the task of training as 
Ontario Hos- 


For a full 


discussion of this approach, we consider it best to refer the reader directly to Spearman's Abilities of Man (6); es- 


pecially Chapters VI, X and the Appendix. 


The criterion of "tetrad differences" is expressed in the following formu- 


18, Tap X Thq - Taq * Tbp = 0; r stands for the correlation, and the subscripts indicate the measures correlated. 
This formula is termed the "tetrad equation", and the value constituting the left side of it is the "tetrad differ- 


ence." 


Wherever the tetrad difference approximates zero, equi-proportionality is said to obtain. This means that 


the intercorrelations are due to the existence of a factor common to all four variables, and a specific factor peculi 


ar to each of the variables. 


The following tetrad illustrates this point: 
b. Qa. 


a. -80 -60 


Pp. 40 


Here the tetrad equation is: 80 x 30 - 60 x 40=0. 


a. 80 
Pp. -40 


30 


When this criterion is not satisfied, as, for example, 


q- 
50 
60 


"overlap", or the presence of a "group" factor is suggested. That is, in the preceding tetrad, variables p and q are 


linked by more than the factor common to the other variables. 
ence value is three or four times that of the probable error of the tetrad. 


ship of factorial analysis 
Psychology, XXIV (October, 1935). 


A "significant" tetrad difference is where the differ- 
A masterly discussion of the relation- 


to psychological method is to be found in a recent article by W. Line, British Journal of 












oe -. 
POR oN 
empeatins~~ 
Se he “y 


ee EE Sap PNET, oo 
mere 































A” 





OST eo er se ES 








302 


tendency", supposedly reflected in tests of 
"reversing triangles", "S-S" and letter "se- 
quence", “mirror-writing" and the like, 
seemed to parallel some of the outstanding 
differences above mentioned, it was decided 
to explore more extensively the possibili- 
ties of indicating the presence of the "per- 
severation” factor in a larger group of sub- 
jects. That this might prove to be a fruit- 
ful approach was further suggested when the 
records of behavior in school were again 
examined; for the same contrast appeared to 
be evidenced especially in connection with 
"attitude towards work and work habits": 
"persevering"--"gives up easily"; “late in 
starting"--"prompt in starting"; “steady 
worker"--"restless, requiring constant super- 
vision"; “drowsy, day-dreamer"--"active"; 
"careful, neat and orderly"--"careless, un- 
tidy.” 


Main Investigations 

A group of thirty-four cases was se- 
lected according to the following criteria: 

C.A. 10.0 to 16.0 years. 

M.A. 7.0 to 11.6 years. 

I.Q. 50 to 70 (Stanford Revision, Binet- 

Simon Test). 

Upper School of the Academic Department. 

Free from "Medical conditions affecting 

learning." 

It will be seen that again the purpose 
behind the selection was mainly that of se- 
curing as homogeneous a group as possible 
regarding the major known variables inthis 
situation. 





I. Tests of Perseveration 

The choice of a small battery of "per- 
severation" tests was guided by former in- 
vestigations in this field. Their nature is 
indicated in what follows; and the actual 
material used is reproduced in Appendix I. 
In these tests, a slightly modified task al- 
ternates with an original task, creating a 
disturbance in execution. The extent of 
the disturbance is taken as indicating the 
effect of “perseveration.” 





JOURNAL OF EXPERIMENTAL EDUCATION 








Volume II, No. 3 


P - 1 — Reversed U's. (cf. Bernstein's Re- 
versed Triangles. ) 

P = 2 - Pied Type--underlining “at"—inter- 
ference caused by breaking the regu- 
larity of presentation. (This, Tests 
P - 3 and P - 4 immediately below, 
are modifications of Lankes' Cancel- 
lation Test.) 

P - 3 - Pied Type--2 paragraphs--crossing 
out “a*="c" in the first and "d"-"z" 
in the second. 

P - 4 - Pied Type--underlining “man"-—~in- 
volving disturbance of regular pre- 
sentation. 

P - 5 - Writing first 9 letters of the al- 
phabet and the first 9 digits sepa- 
rately, and then alternately, e.¢., 
la2b3c, etc. 

P - 6 - Reversed Triangles. 


Scoring the Tests 

The method of scoring was that given 
by Lankes (2). The number of correct re- 
sponses under conditions of interference was 
subtracted from the number of correct re- 
sponses under normal conditions, and the re- 
sult was divided by the latter.} 





Administration of Tests 

Several of these tests were first tried 
out as group tests. With the kind of cases 
with which we were dealing, this was de- 
cidedly unsatisfactory. The reduction of a 
group to include but six or eight subjects 
did not appear to improve matters to any 
great extent. The question of "rapport” be- 
tween the experimenter and the subject wit 
respect to this type of test, and more es- 
pecially in this situation, assumed unex- 
pected proporticns, and led to the adoption 
of an individual method of examination. 
Furthermore, it was evidenced that on no oc- 
casion could any subject be given more than 
three tests of the proposed battery at any 
one interview. The tests being used were 
greatly simplified modifications of tests 
suggested by the literature on "persevera- 
tion." 








1. This method of scoring has persisted with remarkable consistency throughout investigations on this topic. Reference 
also, Jones, Bernstein, Hargreaves, Pinard. During the scoring of "P" tests employed in a previous investigation (5), 
using 220 subjects, two other methods of scoring were investigated. The findings favored the one employed in the 


present investigation. 





yarch, 1934 


Even on this basis, it became an estab- 
lished opinion before half the programme of 
applying the tests was accomplished that the 
question of rapport was exceedingly impor- 
tant. “Restlessness", “unwillingness”, "not 
effortful", “difficult resistance", “leisure- 
ly" and similar expressions are comments 
that appeared with unusual frequency in the 
experimenter's notes regarding the examina- 
tions. It is considered that institutionalt 
gation may have much to do with this condi- 
tion, since it is probably due to some char- 
acter of the local situation making for a 
more or less general attitude on the part of 
the individuals involved. Thus the question 
of individual motivation became an addition- 
al problem, and one which is probably of more 
importance to the measurement of “persevera- 
tion" than we are wont to admit. The neces- 
sity for the maintenance of sustained speed 
in these test situations is considerably 


emphasized in the light of the many difficul-| 


ties encountered. 


Results 

The scores in each test were intercor- 
related and the values are shown in Table I. 
The tetrad-differences for this Table are 
distributed in Figure l. 


TABLE I 


INTERCORRELATIONS--"P" TESTS 








P-3 P-6 
- -20 
22 
222 - 
-16 
09 
-.09 


P-2 
222 
-16 
29 
229) - 
19 
-.06 


P-4 
224 
-09 
-19 
222 
222 - 

07 


P-1 
-O2 
-.09 
-.06 
07 
-16 
-16 - 


Pool 
-63 
-70 
20 
«30 
-41 
-60 


M.A. 
-00 
-00 
-00 
-10 

-.05 
03 





-20 
222 
+24 
02 
































Reliability (correlated halves of 
(P - 3, 6, 4, vs. 5, 
Number of cases: 34 


battery) r = .48 
2, 1) 





K. H. Rogers 





- 


























—12 10.08 06 04 02 0 0204.06.06 10+i2 


Figure 1. Distribution of Tetrad- 
Differences from Table I. 


Number of tetrads: 90 
Average of tetrad-differences .025 
PE of tetrad-differences -O22 


Discussion of Results 

The intercorrelations shown in Table I 
are not high, but, with two exceptions, are 
positive. Each test, excepting P - 1, shows 
significant correlation with one or more 
tests of the battery. We are dealing 
throughout with uncorrected scores. The cor- 
relation of the individual tests with the 
"pool" are all significant, especially in 
view of the probable range of talent for the 
group. Further, the mean of the tetrad-dif- 
ferences, (.025), approximates the probable 
error of the distribution (.022).” There 
seems to be evidence for the presence in 
these scores of a factor common to all vari- 
ables. But this factor is quite different 
from "g", for, in addition to the fact that 
the correlations of the separate tests and 
Mental Age are quite negligible, there is 
the evidence of the following tetrad: 





l. It is interesting to note the comments of Pinard (4) in this respect. 


"It was evident to the most superficial observer, that without introducing a strong motive for performing these 


tests, the result could never be a proper measure of perseveration. 


to be convinced of this fact..... 


"The experimenter tried various ways to interest them in their work. 


One had to but study their method of procedure 


On the advice of the staff, who had pre- 


dicted trouble of this kind, the children were told that the tests were games, but their disappointment was colossal, 


and after this they frankly ridiculed the idea. 


Then their cooperation was asked for the sake of science; this 
called forth a better response, but the effort never lasted for more than a few minutes. 


To stimulate a competitive 


spirit, the score of the total output of work was posted up from day to day, but many of the more difficult children 


were not interested. 


It was not until the writer had taken them out in small parties on the lakes and for rambles 


in the woods and had joined in their games and sports, when the personal factor was introduced and they tried to 
please him, that some amount of interest was created and sustained effort and speed was assured." (pp. 10-11.) 

- A criterion for equi-proportionality throughout a table of intercorrelations is that the median tetrad-difference and 
the probable error (of the tetrad-differences) are about equal. Reference, Spearman (6), pp. 140-1. 








304 JOURNAL OF EXPERIMENTAL EDUCATION Volume II, No. 3 

















M.A. *P* (b) Since equi-proportionality does not appear 
ae to obtain throughout this table, any inter- 
| op ta) Standing ae _~ pretation of the results necessitates fur- 

a : ? ther analysis. 
y retra-aittorence = 500 | gq footing tetrad gives no reat 
ae PE (of the tetrad) = .08 . ’ 
ie 
a oA. ed 
pags | II. Relationship Between Scores on the "Per- Mea Bpeed f (a) 
ae severation" Tests and Other Measures: Acad. (b) .70 -74 
Speed 1 (b) 68 92 
1. Quantitative Procedures 
Tetrad-diffe ce 148 
For reasons set forth in the preceding PE (of a colons) 055 
paragraph, it was decided to take as a ten- . 
tative measure of “perseveration” the total A similar situation pertains when Speed 


score on the "pool" of tests. This seemed 2 is compared with Mental Age and Academic 
to be justified because of the indications Standing: 
(given above) of the presence of a common 





factor throughout these measures that was Meh. Speed 8 (a) 
different from "g". And for convenience, we Acad. -70 209 
are here designating this composite score as Speed 2 (b) .28 232 
indicative of “perseveration.” 

The next consideration was the possible i ae cee ono 
clarification of this factor “perseveration” 
by comparison of the scores thus obtained, 
with other measures. For this purpose, the With Will-Temperament, the correspond- 


following tests were selected: (a) Stanford-| ing relationships are as follows: 
Revision, Binet-Simon; (b) Academic Scores, 





(composite for all Subjects); (c) Downey MeA. Wili-Temperement (8) 
Will-Temperament (individual method); (4d) Acad. -70 239 
Speed l--a battery of tests involving simple W-T (bd) 012 43 
perceptual relations (3); (e) Speed 2--the 

ce ° 
first section of the "perseveration" tests ee ees ~' 
themselves, from which a measure of apparent- . 


mh ace ly simple speed of execution could be de- 
cea rived by taking the number of items recorded 
Wed in a given time. very marked. 
The correlations between the scores on A like comparison for "Perseveration"” 
these tests are given in Table II. gives a somewhat more significant tetrad- 
, difference, as was indicated above, where the 


While there is here some suggestion of 
a group factor, the degree of overlap is not 











TABLE II tetrad-difference was seen to be .308 and the 
RRELA PE (of the tetrad) .08. This indication 
om —S oo, ee would, of course, be more easily interpreted 
in the case of a larger group of subjects 
mpr_| W-T | a lacad.| “Speed 1*| "Speed 2" and more reliable "perseveration" tests, in 
(48) | (.43)) == |=—="| (92) (.32) | which case the tetrad-difference would be in- 
apn ma .le6 | .21] .19 .16 .28 creased, and the probable error decreased. 
W-T 16 - -16| .40 -20 17 From the values in Table II, the most 
M.A. 21} .15] - | 70 -66 42 significant tetrad-difference is the follow- 
Acad. 19 40 -70|; - 97 223 ing: 
Speed 1} .16 | .20 | .66| .97 - .03 
Speed 2} .38 | .17 | .42| .23 -03 - 























No. of cases: 34 








1. "P* (a) and "P* (b) refer to two halves of the battery of "P* tests. 





March, 1934 


“—* 
19 
238 


Tetrad-difference 
PE (of the tetrad) 


Speed 1 


297 
03 


Academic 
Speed 2 


363 
092 


Other possible tetrads with these 
four variables are the following: 


same 


Speed 1 


16 
097 


Speed 2 


238 
223 


°—" 
Acad. 


Tetrad-difference 
PE (of the tetrad) 


332 
O09 

“- 
019 
16 


Tetrad-difference 
PE (of the tetrad) 


Speed 2 


02d 
203 


Acad. 
Speed 1 


2031 
ell 


It will be noted that a significant? tetrad- 
difference does not obtain unless the cor- 
relation between Speed 1 and Academic Stand- 
ing is involved; and since the Academic 


scores have proved to be high "g" saturated? 
we may contend that the above suggested “per- 
severation" factor is certainly not due to 
nue 
ge 

Reviewing the indication of this inves- 
tigation, we would recall two points men- 


tioned in an earlier study,® namely, that 
with subnormal subjects, "g" undoubtedly 
plays a highly significant part in determin- 
ing individual differences, and that unusual 
exercises such as those involved in "persev- 


K. H. Rogers 





305 


eration” tests demand an exceedingly care- 
ful technique on the part of the examiner. 
This latter point has been frequently re- 
iterated by other investigators.* Viewed in 
the light of the difficulty of isolating 
measurable variables other than "g" in this 
type of situation, therefore, it was felt 
that the indication of a "perseveration" 
factor given above might be sufficiently def- 
inite to warrant further study. The same 
may possitly be true of the factor mirrored 
less clearly in connection with the Will- 
Temperament Test, especially since some like 
indications have been reported by other 
workers.” The results obtained with the Will- 
Temperament tests are being examined in con- 
nection with an independent study. 


2. Some Qualitative Procedures 

An attempt was made to relate "perse- 
veration" scores to personality differences. 
The first procedure made use of the Rosanoff 
classification of personality “types."® Four 
psychological workers cooperated in this 
task.’ Full descriptions of the "types" and 
subtypes were carefully studied by the ex- 
aminers, and a number of trial cases were 
freely discussed and classified. In the 
appraising of the selected group, the ex- 
aminers worked separately, but always in 
the same general situation, namely, a large 
examining room into which each subject was 
conducted. There he was asked to perform 
one or two small tasks, and then, being seat- 
ed at a table, to answer a number of ques- 
tions concerning himself and his daily rou- 
tine. It was intended that this behavior of 
the subject would offer a common basis for 
the separate judgments of the examiners. 





. That is, at least three times the PE of the tetrad. 
- See previous study: 


K. H. Rogers, Journal of Experimental Education, II (September, 1955), p. 36. 





- Ibid. 
. Pimard especially (4), refer to footnote l, p. 


3; see also, Cattell (1). 


. The studies of Oates (British Journal of Psychology, XIX, 1928; XX, 1929), respecting the functional basis of the 





Downey Will-Temperament tests are of interest here. 
statistical and systematic analysis of the results. 


Oates followed his application of the tests with a searching 
In the first study, he concluded that the criterion for a gen- 


eral factor operating throughout the tests was not satisfied, but that there appeared to be a group factor of wide 
generality. The second study centering its interest in the nature of this, finds that there is a general tempera- 
mental or emotiona” "factor" entering into the Will-Temperament tests, which may be closely related to "general emo- 


tionality." 
- This author (Rosanoff: 





Psychological Bulletin, XVII, 1920), presents an interesting attempt to relate varieties of 


normal personality to psychotic or "insane" types, maintaining that normal personalities differ from psychopathic 


ones quantitatively rather than qualitatively. 
(1) Anti-social type. 
(2) Cyclothymic type: 
(3) Shut-in type (Autistic). 
(4) Epileptic type. 


His "types" may be outlined in the following manner: 


Manic, Depressive, Irascible and Unstable make-up. 


. The writer is pleased to acknowledge his indebtedness to Miss Olive Russell, Miss Mary Marshall, and Mr. Jack Steer 


for assistance in this and other situations. 


Also, to express thanks to Dr. S. Horne, Superintendent of the Ontario 


Hospital, Orillia, for permission to use pupils of the school as subjects throughout this investigation. 





















































































































































































































































306 JOURNAL OF EXPERIMENTAL EDUCATION 


Each examiner had previous acquaintance with 
the subjects, having had them at "testing" 
interviews on at least two occasions. Also, 
the workers had reports on the subjects from 
two of their teachers and from ward super- 
visors. The social and developmental his- 
tories for this group of children had been 
written up and offered further background 
information. Appraisals were made for all 
cases. Then, the nine cases falling in each 
of the lower and upper quartile ranges of 
the "perseveration" scores were contrasted. 

Figure 2 presents a graphical arrange- 
ment of the results obtained. In several 
instances a rater found it necessary to 
place an individual under two “types"; this 
was discouraged but not prohibited. The 
graph includes all judgments. 





Low "Perseveration" 
en a ene High "Perseveration" 


Figure 2. Graphic Distribution 
of All Judgments. 


The following tendencies seem to be 
suggested by the data. Instability ("swing 
back and forth between opposite poles of 
emotion", and alternating moods) and Autism 
(tendency toward emotional apathy and dis- 
sociative phenomena, a reduction of exter- 
nal interests leading to vague detachment 
from reality, and shyness) tend to be com- 
mon to both high and low perseveratives in 
so far as either one is a characteristic of 
any one individual, with the possibility, 
that, if it should be conditioned by one 
tendency as against the other respecting 











Volume II, No, 3 


perseveration, it would tend to be favored 
by low perseveration. Rather more definite- 
ly suggested is the tendency for the high 
perseverators to display “antisocial” char- 
acteristics. There is evidence that "manic" 
features are favored by low perseveration. 

We might further characterize the high 
perseverators, since they incline toward the 
depressive make-up, as exhibiting a special 
sensitiveness to the cares, troubles and 
disappointments of life; they take things 
hard, feel unpleasantnesses, and tend toward 
physical and mental despair. This general 
condition is the setting in which any un- 
stable and autistic tendencies have their 
play. On the other hand, the low persevera- 
tors tend to be characterized by disposition 
toward hysterical manifestations, both men- 
tal and physical, which may show itself in 
social life in lying, malingering and thiev- 
ing, and other antisocial responses to re- 
straint; this constitution would favor out- 
bursts of temper and impulsiveness. In our 
immediate experimental situation, of course, 
these general descriptions are to be tempered 
by the fact that we are dealing with mental 
deficients, with no evidences of developed 
dementia, and that our subjects are few in 
number. 

Estimations respecting extrovertive and 
introvertive tendencies were made on a five- 
point scale. Each case was located by the 
four examiners working separately, on the 
basis outlined above. The results are sun- 
marized in Table III. 





TABLE III 


"PERSEVERATION" AND EXTROVERTIVE-INTHOVERTIVE 
TENDENCIES. (FREQUENCY OF JUDGMENTS ON A 
FIVE-POINT SCALE.) 








Point ee, (1) (2) (3) (4) (5) 


(Extrovertive) (Introvertive) 
Low "p" 8 4 7 12 5 
High "pM 7 8 ? 10 4 




















There is no indication of relationship 
between this interpretation of Extroversion- 
Introversion and differences derived from 
the "P" tests; this shows similarity with 
other findings (4). 

It was mentioned above that social 


yarch, 1934 


histories had been secured on each of those 
sudjects. A brief summary was made of these 
under the following categories: (1) general 
behavior characteristics, (2) recorded mis- 
jemeanors, and (3) family history. While the 
pictures presented under the two divisions, 
¥igh Perseveration and Low Perseveration, 
suggested some interesting differences, it 
was felt that the data did not warrant the 
deriving of any definite conclusions. These 
jifferences, it appears, rather than provid- 
ing a basis for positive assertions about 
personality differences respecting persevera- 
tive function, disclose a number of impor- 
tant problems on this topic. For example: 
Have differences in family background any 
relation to the presence of perseveration as 
a functional factor? Is there a causative 
relation between family situation and the 
functional tendency toward either high or 
low perseveration in the persons that have 

adjust to this situation? Such queries 
are undoubtedly of considerable pertinence, 

* they call in question Lankes! statement 
respecting the innateness of perseverative 
iifferences (2). 


CONCLUSIONS 


(1) A preliminary study indicated that 

scores on tests of "perseveration" paralleled 
ertain marked personality differences. This 
led to a more extensive exploration ina 
larger group of subjects. 

(2) The usual tests of "perseveration" 
required considerable simplification and to 
be administered individually when subnorma]l 
school children were used as subjects. 

(3) In the results obtained from the 
application of a battery of "P" tests, there 
was evidence for the presence of a common 
factor underlying these measures. This fac- 
tor proved to be quite different from "g" 

"intelligence" ); throughout this study, it 
is spoken of as "perseveration." 

(4) In a table of intercorrelations be- 
tween "perseveration", Will-Temperament and 
two speed tests, mental age ratings and 
academic achievement scores equi-proportion- 
ality did not obtain. 

(5) An analysis of pertinent tetrads 
shoWs a significant tetrad-difference for 
"perseveration"; there was also some sugges- 
tion of a Will-Temperament group factor. 


K. H. Rogers 





| 


| 
| 
| 
| 
| 





307 


(6) Certain qualitative procedures evi- 
denced a relationship of "perseveration" 
with personality differences, but no rela- 
tionship with Extroversion-Introversion 
tendencies. Differences in the social his- 
tories of subjects representing the extremes 
of the perseveration-score distribution in- 
dicated an important aspect of the problem 
for further investigation. 


SUMMARY 


This study was primarily a search for 
records that indicate variables in personal- 
ity that could be investigated eventually by 
the factorial analysis technique. The sub- 
jects were a group of subnormal children. A 
preliminary study directed the investigation 
toward a more intensive consideration of 
"perseveration"--i.e., Spearman's "P" factor, 
This was isolated and then studied in rela- 
tion to (a) certain quantitative procedures, 
namely: Will-Temperament test scores, speed 
test scores, mental-age ratings and academic 
achievement marks; (b) certain qualitative 
procedures, namely: differences in personal- 
ity “types", Extroversion-Introversion tend- 
encies, and social background. 


REFERENCES 


Cattell, R. 8. "Temperament Tests. 

II Tests," British Journal of Psychol- 
ogy, (1933), pp. 20-46. 

Lankes, W. "Perseveration," British 
Journal of Psychology, (1915), pp. 387- 
419. 

Line, W. and Kaplan, "The Existence, 
Measurement and Significance of a Speed 
Factor in the Abilities of Public 
School Children," Journal of Experi- 
mental Education,I (September, 1932), 
pp. 1-8. 

Pinard, J. W. “Tests of Perseveration," 
British Journal of Psychology, (1932), 
pp. 5-19; pp. 114-126. 

Rogers, K. H. “Intelligence and Perse- 
veration Related to School Achievement," 
Journal of Experimental Education, II 
(September, 1933), pp. 35-43. 

Spearman, C. The Abilities of Man, (New 
York: Macmillan Company, 1927), 
pp. xxxiii and 415. 


yi 


* 











Pr 2s 


P= 2. 


JOURNAL OF EXPERIMENTAL EDUCATION 


APPEND IX 
TESTS OF "PERSEVERATION® 


Pied Type--underlining “at." 

The subjects were given a page 
containing 3 paragraphs of pied 
type. The first paragraph had the 
function of a practice exercise. The 
construction of the first two para- 
graphs was similar in that the two 
letters "at" appear together at reg- 
ular intervals throughout. The third 
paragraph contained the same number 
of "at"s as the second, but present- 
ed at irregular intervals. 

The subjects were instructed to 
underline “at" each time they came 
to it. One minute was allowed for 
each of the second and third sec- 
tions. The task was performed “as 
rapidly as possible" and with the 
caution "to be sure not to miss any." 

The test was fully explained 
beforehand. The signal to commence 
the second paragraph was: "“Ready-- 
Mark every ‘at! as you come to it, 
and be sure not to miss any--go.” 
The transfer from the second to the 
third paragraph followed the signal, 
"Change." 

Pied Type--crossing out "a - c", and 
"4 <« g.° 

The subjects were given a page 
containing 2 paragraphs of pied type. 
These paragraphs differed in this 
respect only, that where "a" appeared 
in the first, "d" appeared in the 
second, and "z" in the second re- 
placed the "c"s of the first. 

The subjects were instructed to 
cross out all the "a"s and “c"s in 
the first paragraph and the "d"s and 
"2"s in the second. The task was 
performed "as rapidly as possible", 
the subjects being allowed one minute 
for each paragraph. 

The test was fully explained 
before being given. The signal to 
commence was "Ready--go"; and they 
changed over from the first to the 
second paragraph at the signal 
"Change." 





P- 3. 


P- 4, 





Volume II, No. 3 


Number--letter test. 
(a) The subjects were instruct- 
ed to write the numbers, 


123456789 


as rapidly as possible for one min- 
ute. 
(b) Then, to write the letters, 


abcdefghi 


as rapidly as possible for one min- 
ute. 

(c) And finally, the numbers 
and letters alternately, as follows: 


la2z2v3c4da5e0e6f7g¢_¢8h9i 


as rapidly as possible for two min- 
utes. 

The whole test was first ex- 
plained carefully before being given. 
The subjects commenced at “Ready-- 
go." After the minute, the examiner 
directed, "Change--now the letters-- 
go." Then, following the second 
minute, "Change--now both, the num- 
bers and letters--go." 

Triangles Test (cf. Bernstein 
(British Journal of Psychology, Mono- 
graph Supplement, 1924, 3)). 

(a) The subjects were instruct- 
ed to draw a series of triangles-- 
apex upwards--as fast as possible 
for 30 seconds; thus: AAAAAAAAA 

(b) A similar series was then 
drawn, with apex downwards, as fast 
as possible for 30 seconds; thus: 
VVUVVVVVVIVVVV9 

(c) The subjects repeated (a) 
and (b) twice. 

(d) Finally the subjects drew 
as rapidly as possible for three 
minutes the following series (alter- 
nately upward and inverted tri- 
angles): AVAVAVAVAVAVAVAVAVA 

The whole test was explained 
(on the blackboard for the groups) 
before being given. The commencing 
signal was, "Ready--go." After each 


30 second period “Change” was called 
out; and the subject commenced the 





March, 1934 K. H. Rogers 


309 


next set of triangles. Preceding The whole test was careful- 
the last section, the directions ly explained at the blackboard be- 
were, "Change--now, both together, fore being given. The subjects com- 
first one with the point up, then menced at “ready--cgo." The word 
one with the point down, then one "Change" was sufficient to direct 
with the point up, and so on. Ready them to each of the successive op- 
—-go; just as fast as you can." erations. 
"ea" Test. 
(a) The subjects were instruct- Scoring the "P" Tests. 
ed to write the word "redeemer" as The method of scoring was’ that 
often as possible for one minute. given by Lankes (2). The number of 
(bd) Then to write this same correct responses under conditions 
word but inserting an "a" after each of interference was subtracted from 
"e"--as follows: "“readeaeamear"; as the number of correct responses un- 
often as possible for one minute. der normal conditions, and the re- 
(c) And finally to write the sult was divided by the latter. 
units of (a) and (b) alternately as 
rapidly as possible for two minutes. 






















AN ANALYSIS OF THE SCORES OF EIGHTH-GRADE PUPILS AND 






Volume II, No. 3 


NORMAL SCHOOL STUDENTS ON CERTAIN OBJECTIVE TESTS 


by 


C. C. Upshall and Harry V. Masters 
Washington State Normal School 


For a number of years the State Normal 
School at Bellingham has administered a bat- 
tery of tests to its entering students. This 
battery of entrance tests includes the Thorn- 
dike Intelligence Examination for High School 
Graduates, and a test in each of the follow- 
ing fields: arithmetic computation, arith- 
metic reasoning, English usage, spelling, 
geography, history and penmanship. All of 
these tests except the Thorndike Intelli- 
gence Examination and the penmanship tests 
have been prepared by the Bureau of Research 
of the State Normal School at Bellingham. 
These tests have not been published and great 
care is taken that no copy is left in the 
hands of the students. Care is taken to pre- 
vent direct coaching. 

The arithmetic computation and reason- 
ing tests are similar in form and content to 
the tests of the same name in the New Stan- 
ford Achievement Test. The English usage 
test includes four sections: Punctuation, 
Good Use, Grammar, and Sentence Structure. 
The first, second, and fourth sections re- 
quire a decision in regard to the rightness 
or wrongness of the sentences presented. The 
section on grammar is divided into two parts. 
Part A tests recognition of correct verb 
forms and Part B requires the naming of parts 
of speech presented in sentences. The his- 
tory test samples the subject of American 
history and the geography test samples the 
field of geography. The number of equiva- 
lent forms, number of items, working times 
and reliabilities are given in Table I. 

In arithmetic computation, arithmetic 
reasoning, English usage, and spelling, a 
specified grade must be attained before the 
students are allowed to do their practice- 
teeching. The upper limit of this grade is 
equivalent to the score which is one-half 
Sigma below the mean score of the group 


Bellingham, Washington 





which enters the institution for the first 
time in the fall quarter. In penmanship the 
Ayres Scale is used for which norms have 
been established upon the basis of the 
achievement of entering students. Students 
who fail on the penmanship test must take a 
course in this subject. No standards are 
set for geography and history, and no re- 
tests are required in these fields. 

The main problems of this study are: 

1. In what direction and how signifi- 
cant are the differences between the scores 
made by the eighth-grade pupils on the tests 
and the scores made on these same tests by 
students entering the Normal School? 

2. In what direction and how signifi- 
cant are the differences between the scores 
made by the eighth-grade pupils and the 
scores made by Normal School students two 
weeks before their graduation? 

3. In what direction and how signifi- 
cant are the differences between the scores 
of students upon entrance to the Normal 
School and at the end of six quarters of at- 
tendance? 

4, What gain in score on the entrance 
tests is made by the graduating class during 
their six quarters of attendance in the 
Bellingham Normal School? 

5. What changes are revealed, at the 
completion of six quarters! work, in the 
variability of the scores on each of the 
tests administered? 

6. What are the factors contributing to 
the gains in score made during the two-year 
normal course? 

In May, 1930, 125 students who had en- 
tered in the fall quarter of 1928 and who 
were candidates for graduation in June, 1930, 
were retested in the same fields (with the 
exception of spelling and handwriting) in 
which they were tested upon entering the 





varch, 1934 


institution. The same forms of the tests 
were given both times except in arithmetic 
reasoning and arithmetic computation where 
equivalent forms were used. From previous 
use of the duplicate forms which were used 
in this study, the differences in difficul- 
ty and range of the forms were known to be 
small sc that their effect upon the differ- 
ences shown in Tables III and IV should be 
insignificant. The students were given only 
three days' notice that they were to take 
the entrance tests again. This was done as 
a precaution against special preparation. 
The extent to which the total group of 
entering freshmen in September, 1928 is a 
random sample of all freshmen entering the 
State Normal School at Bellingham is diffi- 
cult to determine. On the basis of the sim- 
ilarity of mean scores and standard devia- 
tions of the entrance tests for all entering 
groups between 1927 and 1933 it is believed 
that the group of freshmen who entered in 
September, 1928 may be considered a random 
sample until basic changes have been made in 
the high school requirements and the condi- 
tions determining entrance to the State Nor- 
mal School at Bellingham. Since the 125 
students represent all of the freshmen en- 
tering in the fall of 1928 who were ready 
to graduate at the end of six quarters the 
question arises as to the extent to which 
they are a random sample of all entering 
students who are ready to graduate after six 
quarters. In this study it is assumed that 
they are a random sample on the basis of the 
lack of evidence that there have been any 
basic changes in the curriculum and entrance 
requirements.! Moreover the means and stand- 
ard deviations of the college aptitude test 
given to a similar group in 1929 and 193] 
were decidedly similar to those of the group 
which entered in 1928 and graduated in 1930. 
The tests mentioned above were also 
given in February, 1930, to pupils who had 
entered the eighth grade in September, 1929. 
These pupils were chosen from both large and 
small cities in the State of Washington. 
Each superintendent was asked to give the 
tests to his average group of eighth— grade 
pupils where there was more than one group 
to be tested. These methods were used in 





C. C. Upshall and H. V. Masters 


an attempt to obtain a random sample of 
eighth-grade pupils in Washington. There is 
no proof of the extent to which this aim was 
achieved. 

In order to interpret properly any dif- 
ferences which might appear in the test re- 
sults, the accuracy of measurement of the 
various tests must be considered. Table I 
gives the reliability of each of the tests 
that was used in this study. The reliabil- 
ity for each of the achievement tests was 
secured by correlating chance halves and cor- 
recting by the Spearman-Brown formula. The 
reliability of the Thorndike test is that 
given by Wood.? 


TABLE I 


TESTS GIVEN TO 
STATE NORMAL 


STUDENTS ENTERING THE 
SCHOOL AT BELLINGHAM 


) 





Name of Test N 


of 


| 
| 
| 


Reliability 
Minutes of 
Working Time 
No. 

Items 


Thorndike (exan- . 
ination for 
high school 
graduates) 

History 

Geography 

English Usage 

Arithmetic 
Reasoning 

Arithmetic 
Computation 











In view of the working time of the tests 
and the relatively select group to which 
they were administered, the reliability co- 
efficients for the tests in history, English 
usage, and geography may be considered as 
quite high. The reliability coefficients of 
the two arithmetic tests are somewhat lower. 
This may be accounted for, in part at least, 
by the limited number of items in the tests 
and the shorter working times. 

In Table II are given coefficients of 
correlation which were computed primarily 
for use in the formula of the probable error 
of the difference in which r is used. 





1. Students may no longer graduate at the end of six quarters. 


quarters are required to graduate. 


Graduation requirements have been increased so that nine 


This requirement took effect in 1955. 
2. Ben. D. Wood, Measurement in Higher Education (Yonkers-on-iudson, New York: 


World Book Company, 1923), p. 45. 





as 
on 


312 


JOURNAL OF EXPERIMENTAL EDUCATION 





Volume II, No, 3 








we 
ee 


SMES AEN Tag RE 





oe tet 





Ee 





a 4 


~ 
mk eet 





” Sy ee as Sty Chae: awe 


a cart Gee 
fax tot gmere yim tne 













































































































































































TABLE II 


CORRELATIONS BETWEEN THE SAME TESTS GIVEN 
IN OCTOBER, 1928 AND IN MAY, 1930 TO NOR- 
MAL SCHOOL STUDENTS 

















TABLE III 


MEANS, STANDARD DEVIATIONS, AND NUMBER 


OF CASES ON VARIOUS TESTS 




















—_—_____— — 1 2 3 4 5 
Pearson Coefficient Name of Test Group Form| Mean | Sigma 
Name of Test of Correlation Thorndike 
andinne 1928 A E & J|115.88|10.23 | 324 
Thorndike -80 1928 G E & J|118.31| 9.02) 125 
Arithmetic Reasoning +52 1930 E & J|127.94| 8.40) 125 
> ate Conpatetten oo Arith. Reas. |8th grade™ A 10.54) 3.89) 497 
a iii ‘a 1928 A e 12.31] 3.86 | 347 
- . ni pm 1928 G B 13.22] 3.34) 125 
eography ° 1930 A 15.76| 2.73] 125 
Arith. Comp. |8th grade | C 23.13} 4.69) 468 
1928 A D 21.05} 5.66| 347 
In Table III are given the forms of the 1928 G D 22.78! 5.31! 125 
tests used, the means, the number of stu- 1930 A 26.48| 4.22/ 125 
dents taking the tests, and the standard English Usage|8th grade | A 58.47|15.72| 507 
deviations of the scores obtained by three (scored in |1928 A A 43.83) 16.04 | 356 
different groups of students. The 1928 G terms of num/1928 G A 36.19) 14.35 | 125 
and 1930 results represent the results of ber of er- |1930 A 32.50) 9.56 | 125 
tests given to the same group at two differ- | _7°"*) - 
ent times, i.e., at entrance and just before | %istory Oth grade | A 55.08) 18.48 S41 
bil 1928 A A 72.49| 19.49 | 348 
graduation. The variability of the 1928 1928 G A 76.79|17.83| 125 
scores is greater in the case of each of the 1930 A 93.88] 20.10} 125 
subjects in which no courses are offered Geography 8th grade | A 39.711/16.08| 442 
for credit in the Normal Scrool. (The 1928 A A 55.52|17.85| 349 
courses which are offered in English and 1928 G A 58.06} 15.25] 125 
mathematics are not of such a nature that 1930 A 68.45) 17.26} 125 
they would contribute directly to increasing 
the scores on the entrance tests at the com- | “8th grade--indicates the total group of eighth- 
t i 
pletion of the two-year course.) No courses ue a — a a 
. ‘ 
are offered in English composition or gram~ 1928 A--indicates the scores made by the total 


mar in any of the curricula.? In arithmetic 
reasoning the standard deviation in 1928 was 
3.34 whereas in 1930 it had decreased to 
2.73. In arithmetic computation the stand- 
ard deviation was 5.31 in 1928 and de- 
creased to 4.22 in 1930. In English usage 
the standard deviation in 1928 was 14.35. In 
1930 it had decreased to 9.56. In history 
and geography, however, in which subjects 
courses are offered which should increase the 
students! information of the material meas- 
ured by these tests, the standard deviation 
was greater in 1930 than in 1928. For ex- 
ample, in history the increase in standard 
deviation was from 17.83 in 1928 to 20.10 
in 1930. All of the differences between the 
standard deviations of the tests taken by 
the graduating group are statistically sig- 
nificant except in the case of the Thorndike 
examination. The variabilities of the scores 









































group of students who took the tests 
in the fall of 1928. 


1928 G--indicates the scores made in September, 


1928, by the group of 125 students 
who entered as freshmen in September, 
1928, and who graduated in June, 1930. 


1930--indicates the scores made in May, 1930, by 


the group of 125 students who en- 
tered as freshmen in September, 1925, 
and. who graduated in June, 1930. 


made by the eighth-grade pupils show no con- 
sistent relationship to the variabilities of 
the other three groups. 
In arithmetic reasoning, arithmetic 
computation, and English usage no special 
pressure is exerted on those who meet the 
requirements for practice teaching in the 
entrance tests to cause them to devote spe- 


cial study to these subjects. 


However, in 


the case of those who, on the entrance tests, 
fail to attain the minimum requirements for 





1. An elective course in composition has been offered by the English department since this study was made. 


acl 


gaogugn Vi Ve ™ 


8 
5 
5 
3 
9 
te) 


at 
i 


March, 1934 


practice teaching, sufficient study must be 
done to enable them to meet the require- 
ments in a retest. (A recent ruling limits 
the retests to three in a given subject.) We 
might, therefore, expect as a result of this 
required study a greater gain by those who 
took retests than by those who passed on the 
first test. Table V shows that those re- 
quired to take the retests made the greater 
gain in score but the extent to which the 
gain is the result of definite direct study 
is not known. 

Table IV gives the difference between 
the means of the tests taken by the eighth 
grade and the means of these same tests tak- 
en by (a) the total group which entered the 
institution for the first time in September, 
1928 and (b) the group which was ready for 
graduation after six quarters. It alsogives 
(1) the differences between the means of the 
tests taken by the total entering group and 
the means of the tests taken by the group 


C. C. Upshall and H. V. Masters 313 


grade mean and the mean of all entering stu- 
dents (1928 A group) in each of the tests. 
The average entering normal-school student 
was significantly inferior to the average 
eighth-grade pupil in arithmetic computation 
but in each of the other tests he was sig- 
nificantly superior. 

Table IV also shows that there is a 
large and significant difference between the 
scores of eighth-grade children and the 
scores made by students just before graduat- 
ing from a two-year normal course. These 
graduating students received better average 
scores on the arithmetic computation test. 
The scores received by graduating students 
are better than those received by the total 
entering group. When we compare the scores 
made at entrance and just before graduating 
by the 125 students who were graduated at the 
end of six quarters the latter are distinct- 
ly superior. These differences are all six 
or more times the probable error of the dif- 


which was about to graduate, and (2) the dif-| ference. If another similar sampling from 


ferences between the means of the tests tak- 
en by the graduating group upon entering the 
institution and after six quarters at the 
Normal School. There was a statistically 
significant difference between the eighth- 


the same total population were used the dif- 
ferences would certainly be in the same di- 
rection. 

The entering group of students received 
a significantly lower score on the arithmetic 





TABLE IV 


ANALYSIS OF DIFFERENCES BETWEEN THE MEANS OF INDICATED GROUPS 








D between 8th grade 
(1928 A) 


mean and mean of 


entering group 
D between 8th grade 


mean and mean of 


— | 


D between mean of 

(1928 A) and mean 

of graduating group 
(1928 G) 

graduating group 

(1928 G) on entrance 

and on graduation 
(1930) 


graduating group 
(1928 G) 
entering group 





1 


0 


> 
a 





Thorndike 
Arith. Reas. 
Arith. Comp. 
English Usage 
History 
Geography 


18; 11 . 
225 8 3.35 
-74 |] 20 | 25.97 
-88 | 16 | 35.85 
-82 | 19 | 28.74 


uo 
> 
0 


I+ i+ i+ i+i+ 











6 
16 
17 
14 
15 


aod fe 


i+ 1+ 1+ 1+ I+ I+ 


i+ 1+ 1+ 1+ I+ 
I+ t+ 1+ 1+ 1+ I+ 























*Difference is in favor of the eighth grade. 
Scored in terms of the number of errors. 


| 
| 
| 


The long formula was used for calculating the probable errors in this column. 















314 


computation test than the eighth-grade pu- 
pils received. It is likely that this is 
because of the smaller amount of practice 
given in this subject in high school so that 
skill in the subject has declined. 


TABLE V 


COMPARISON OF GAINS MADE BY THOSE OF THE 
GROUP OF 125 GRADUATING STUDENTS WHO MET 
THE REQUIREMENTS FOR PRACTICE TEACHING ON 

THEIR FIRST TEST AND THOSE WHO DID NOT 








° who 


Those who met not meet re- 
requirements on quirements on 








Name of test first test first test 
Gain in Gain in 
terms of terms of 


Mean sigma of Mean sigma of 
Gain 1928 test} Gain 1928test 











English Usage 1.86 13 22.42 1.56 
Arith. Reas. 1.68 -50 4.71 1.41 
Arith. Comp. 4.00 75 8.88 1.67 
Mean 46 1.55 





The mean gain in score for those who did and 
for those who did not meet the minimum re- 
quirements for practice teaching was comput- 
ed for English usage, arithmetic reasoning, 
and arithmetic computation. Each of the 
gains was then expressed in terms of the sig- 
mas of the 1928 tests. Table V gives these 
data. Those who did not meet the require- 
ments for practice teaching on their first 
attempt gained, on the average, one and one- 
half sigma. Those who met the minimum re- 
quirements for practice teaching on their 
first attempt gained only .46 sigma. This 
disparity of gain would naturally lower the 
variability in these tests in 1930. 


TABLE VI 


ANALYSIS OF GAINS MADE BY STUDENTS WHO MET 
THE TRAINING-SCHOOL REQUIREMENTS ON THEIR 











FIRST TEST 
Gain 
Name of Test Mean gain PEGain 
English Usage 1.86 + .87 2.1 
arith. Reas. 1.68 + .24 7.0 
Arith. Comp. 4.00 + .35 11.4 











JOURNAL OF EXPERIMENTAL EDUCATION 








Volume II, No, 2 


Table VI shows that the mean gains in 
score made in arithmetic computation and 
arithmetic reasoning by those who took only 
two tests separated by a period of nearly 
six quarters are significant. The gain mde 
by this group in English usage is not sta- 
tistically significant. This table shows 
clearly that the type of coaching received 
by the students is not responsible for near- 
ly all the gain in score on the arithmetic 
computation and reasoning tests, since those 
who were not specifically coached made a sig- 
nificant contribution to the gain in score. 
Since only two tests were taken in history 
and geography, the gains in score on these 
tests must be credited to some other cause 
than special coaching for the tests. 


In the case of history and geography 
there has been an opportunity for those who 
have been interested to increase their knowl- 
edge markedly. The 1930 tests should reflect 
this increase in knowledge. It is probable 
that those who have had no interest in these 
fields will improve their scores but little, 
Consequently the variability of the 1930 
tests should be greater in history and geog- 
raphy. 


The unquestionable significance of the 
gain in score made in the two-year period on 
the Thorndike test is of special interest. 
The difference between the mean score of the 
1928 test and the mean score of the 1930 
test is 29.0 times its probable error. When 
the difference is four times the probable 
error of the difference, there are 997 
chances out of 1000 that the difference is 
significant. There is no doubt, then, that 
another group of normal-school students at 
Bellingham would also receive higher scores 
on the Thorndike test when taken six quarters 
after the test taken upon entering the in- 
stitution. 

As has been indicated in Table IV, sig- 
nificant increases in score were shown on 
the 1930 tests over the 1928 tests. From 
the data available the causes for these sig- 
nificant increases cannot be stated defi- 
nitely. It will be valuable, however, to in- 
dicate some of the possible causes. One im- 
portant cause is found in the circumstances 
under which the two sets of tests were ad- 
ministered. The first set of tests was ad- 
ministered to students who, in seventy-two 





: 


- w 


rn ce 


varch, 1934 


per cent of the cases, had just completed 
high school three months previously. The 
majority of the high schools from which the 
students are graduated use few, if any, ob 
jective tests. The Normal School, on the 
contrary, has a well-established system of 
informal objective tests. For the first 
test, then, the pupils were decidedly unac- 
customed to taking objective tests while just 
pefore graduation they were decidedly skilled 
in the technique of taking them. This may 
account for part of the increase in score. 

Another possible reason for the gains 
is the difference in emotional attitude. For 
the first test the whole environment is new 
and strange. This may cause a state of emo- 
tional excitement which may affect the test 
results adversely. Inthe last tests the 
students are presumably much less emotional- 
ly disturbed. 

Entering freshmen have reason to do 
their best on the English usage, arithmetic 
computation, and arithmetic reasoning tests, 
for, until the requisite score is obtained, 
they will not be permitted to do their prac- 
tice teaching and therefore will not be able 
to graduate. There is no unusual pressure 


placed on entering freshmen to do their best 


in the Thorndike, history, and geography 
tests. This may partially account for the 
fact that the gains in the Thorndike and 
history tests are more significant than the 
gains on the arithmetic reasoning, arithme- 
tic computation, and English usage tests. 
The gain in score on the geography test 
practically equals in significance the most 
Significant of the gains in the three tool 
subjects. 

Many of the students who received low 
scores on the arithmetic and English usage 
tests obtained special coaching in these 
subjects so that they could meet the stand- 
ards required before registering for stu- 
dent teaching. This special coaching should 
aid in increasing the scores in these tests. 

Although there are at least two dupli- 
cate forms of each test, the students who 
were weakest in arithmetic and English usage 
tried the same form of the test more than 
once before they obtained the required score. 
This familiarity with the tests undoubtedly 
tends to increase the scores on these tests. 


C. C. Upshall and H. V. Masters 


SUMMARY AND CONCLUSIONS 


1. Five achievement tests were given to 
three groups of students: 

(a) a random sampling of eighth- 
grade pupils 

all students entering the insti- 
tution for the first time in 
September 

the 125 students of the total 
entering group who were about to 
graduate from the two-year nor- 
mal-course at the end of six 
quarters. 

Group (c) took the tests twice--upon en- 
trance to the institution and just before 
graduating from the two-year normal-course. 
The "Thorndike Intelligence Examination 
High School Graduates", Part I, was 
given to groups (b) and (c). 

2. The reliability coefficients of the 
Thorndike, history, geography, and English 
usage tests range from .85 to .95. In view 
of the restricted range of talent and the 
length of working time, these reliabilities 
may be considered quite high. The reliabil- 
ity of the test in arithmetic reasoning is 
-72 and that of arithmetic computation is 
-82. These lower reliabilities may be ac- 
counted for, in part at least, by the limit- 
ed number of items in the tests and the 
shorter working times. 

- Students who enter the normal school 
make significantly better scores on all the 
tests (with the exception of arithmetic com 
putation) than eighth-grade pupils. Eighth- 
grade pupils make a significantly higher 
score on the arithmetic computation test than 
students who enter the normal school. 

4. Students who graduate from the two- 
year normal-school course in six quarters 
make significantly higher scores on all the 
tests, when taken just before graduation, 
than eighth-grade pupils. 

5. The differences between the mean 
test scores made by students on tests taken 
upon entrance and again upon graduation, 
range from approximately six to twenty-nine 
times the probable errors of the differences. 

6. The variability of the scores of the 
tests taken upon entrance was greater than 


(dD) 


for 
also 





the variability of the scores of the tests 





316 JOURNAL OF EXPERIMENTAL EDUCATION Volume II, No. 3 


taken just prior to graduation in the case 7. The students who were required to 
of the Thorndike, arithmetic reasoning, take retests made gains in score which were 
arithmetic computation, and English usage twice to ten times as great as the gains 
tests; but it was less in the history and made by students who met the training-schoo) . 
geography tests. This change in variability requirements on their first trial. 

may be explained in the case of the history 8. Some of the probable causes of 
and geography tests by the fact that some gains in score shown in these tests are: 
courses in these subjects were elective actual gain in knowledge of the subjects, 
while others were required. The students less emotional disturbance while taking the 
who were good in these subjects extended last test, greater test-wiseness and, in the 
their knowledge while the students who were case of the Thorndike Examination, greater 
poor probably avoided courses in these sub- motivation on the final test.? 

jects. 


the 





oc » 
a 
fi 


«x 


Raerers : ag ee eam ftom 


Saari 
Sg aa 
es 


a 


l. Harry V. Masters and C. C. Upshall, Study of Gains Ysce by Normal-School Students in Intelligence Tests, Bureau of 


Research Studies No. 12 (Bellingham, Washington: State Normal School, 1932), 9 pp. 











